mozilla / bigquery-etl

Bigquery ETL
https://mozilla.github.io/bigquery-etl
Mozilla Public License 2.0
241 stars 98 forks source link

bug 1904841: Fix glam_percentile udf #5854

Closed edugfilho closed 4 days ago

edugfilho commented 4 days ago

Covers a case in which all bucket values are zero, and adds a test for it.

Here's a test comparing it with the old JS version of this udf: https://sql.telemetry.mozilla.org/queries/100986/source

Checklist for reviewer:

For modifications to schemas in restricted namespaces (see CODEOWNERS):

┆Issue is synchronized with this Jira Task

dataops-ci-bot commented 4 days ago

Integration report for "Fix glam_percentile udf"

sql.diff

Click to expand! ```diff diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/sql/mozfun/glam/percentile/udf.sql /tmp/workspace/generated-sql/sql/mozfun/glam/percentile/udf.sql --- /tmp/workspace/main-generated-sql/sql/mozfun/glam/percentile/udf.sql 2024-06-27 15:14:38.000000000 +0000 +++ /tmp/workspace/generated-sql/sql/mozfun/glam/percentile/udf.sql 2024-06-27 15:14:33.000000000 +0000 @@ -13,23 +13,38 @@ AND pct <= 100, TRUE, ERROR('percentile must be a value between 0 and 100') - ) pct_ok + ) pct_ok, + SUM(value) AS total_value + FROM + UNNEST(histogram) ), keyed_cum_sum AS ( SELECT key, - SUM(value) OVER (ORDER BY CAST(key AS FLOAT64)) / SUM(value) OVER () AS cum_sum + IF( + total_value = 0, + 0, + SUM(value) OVER (ORDER BY CAST(key AS FLOAT64)) / SUM(value) OVER () + ) cum_sum + FROM + UNNEST(histogram), + check + ), + max_bucket AS ( + SELECT + MAX(CAST(key AS FLOAT64)) AS bucket FROM UNNEST(histogram) ) SELECT - CAST(key AS FLOAT64) + IF(total_value = 0, max_bucket.bucket, CAST(key AS FLOAT64)) FROM keyed_cum_sum, - check + check, + max_bucket WHERE check.pct_ok - AND cum_sum >= pct / 100 + AND (total_value = 0 OR cum_sum >= pct / 100) ORDER BY cum_sum LIMIT ```

Link to full diff