snowplow / dbt-snowplow-utils

Snowplow utility functions to be used in conjunction with the snowplow-web dbt package.
Other
13 stars 6 forks source link

Prepare Utils codebase for Spark ( Iceberg ) Support #180

Closed ilias1111 closed 2 months ago

ilias1111 commented 3 months ago

Description

  1. Added a variable for the file format configuration, defaulting to delta with iceberg as an alternative for Spark targets.
  2. Modified dbt_project.yml to include the file format configuration for models based on the target type.
  3. Implemented custom date_diff and date_add functions in timestamp_functions.sql, casting results as bigint to prevent overflow.
  4. Enhanced the get_value_by_target_type macro to include spark as a valid option.
  5. Developed Spark-specific versions of the get_string_agg and get_field macros to ensure compatibility.
  6. Universal Changes
    • Replaced the QUALIFY clause with PostgreSQL-compatible logic to ensure consistent behavior across different target types.
    • Modified the casting syntax to use CAST(x AS data_type) for Spark compatibility.
    • Adapted integration tests by creating Spark-specific source files to accommodate Spark's syntax requirements and limitations.
    • Configured the incremental strategy based on the target type, using 'delete+insert' for Postgres and Redshift, and 'merge' for Spark.
    • Adjusted the handling of ROW_NUMBER() in tests to account for Spark's non-deterministic behavior.

What type of PR is this? (check all applicable)

Related Tickets & Documents

Data Lakes Epic

Checklist

Added tests?

Added to documentation?

[optional] What gif best describes this PR or how it makes you feel?

I like big data