replace two (out of five) of the javascript udfs usages with a sql udf
rearrange ctes so most of the udfs are called per event ping rather than on the unnested events
performance
1% sample (sample_id = 1) on firefox desktop for 2024-05-22:
changes
slot hours
job id (backfill-2)
base
24.5
bquxjob_7a85e33a_18fab4919e3
rearranged
5.7
bquxjob_4ababcd5_18fab3b62ea
sql udfs
4.3
bquxjob_129253ac_18fab52d5c9
rearranged + sql udfs
3.4
bquxjob_4145760d_18fab765f03
100% of firefox desktop for 2024-05-22 (both overwriting a clustered partition):
changes
slot hours
job id (backfill-2)
base (airflow run)
3910 😵💫
bqjob_r6deaccf950e8d7f5_0000018fa35c4fce_1
rearranged + sql udfs
699
bquxjob_5293faeb_18fab5ab707
I'm mostly confident the output is equivalent because this doesn't throw any errors:
SELECT
mozfun.assert.json_equals(from_map_event_extra(event.extra), mozfun.json.from_map(event.extra)),
mozfun.assert.json_equals(from_map_experiment(ping_info.experiments), mozfun.json.from_map(ping_info.experiments)),
FROM
`moz-fx-data-shared-prod.firefox_desktop_stable.events_v1`
CROSS JOIN
UNNEST(events) AS event
WHERE
DATE(submission_timestamp) IN ('2024-05-22')
AND sample_id = 1
Also ran this for fenix and ios
Checklist for reviewer:
[ ] Commits should reference a bug or github issue, if relevant (if a bug is referenced, the pull request should include the bug number in the title).
[ ] If the PR comes from a fork, trigger integration CI tests by running the Push to upstream workflow and provide the <username>:<branch> of the fork as parameter. The parameter will also show up
in the logs of the manual-trigger-required-for-fork CI task together with more detailed instructions.
[ ] If adding a new field to a query, ensure that the schema and dependent downstream schemas have been updated.
[ ] When adding a new derived dataset, ensure that data is not available already (fully or partially) and recommend extending an existing dataset in favor of creating new ones. Data can be available in the bigquery-etl repository, looker-hub or in looker-spoke-default.
For modifications to schemas in restricted namespaces (see CODEOWNERS):
DENG-3889
Changes:
performance
100% of firefox desktop for 2024-05-22 (both overwriting a clustered partition):
I'm mostly confident the output is equivalent because this doesn't throw any errors:
Also ran this for fenix and ios
Checklist for reviewer:
<username>:<branch>
of the fork as parameter. The parameter will also show up in the logs of themanual-trigger-required-for-fork
CI task together with more detailed instructions.For modifications to schemas in restricted namespaces (see
CODEOWNERS
):┆Issue is synchronized with this Jira Task