Closed acmiyaguchi closed 4 years ago
I found an issue with the avro code that I fixed, however it shouldn't have any bearing on the BigQuery schemas.
I found a bug with the avro code, but I'm fairly confident that everything works as expected now. I've written a few queries on the bigquery and raw ndjson files to verify that the values are correct.
SELECT
SUM(list.f0_),
SUM(list.f1_)
FROM
test_avro.telemetry__untrustedModules_v4,
UNNEST(root.payload.combinedStacks.stacks) AS stacks,
UNNEST(stacks.list) AS list
Row | f0_ | f1_ | |
---|---|---|---|
1 | 62325.0 | 1.3835058055987E21 |
cat data/telemetry.untrustedModules.4.ndjson | jq -cr '.payload.combinedStacks.stacks | .[]| .[] | join(",")' | python3 -c "import sys; x=[tuple(map(float, x.split(','))) for x in sys.stdin.readlines()]; print(list(map(sum, zip(*x))))"
[62325.0, 1.3835058055987192e+21]
AND
SELECT
parent.f1_,
parent.f2_,
COUNT(*)
FROM
test_avro.telemetry__event_v4,
UNNEST(root.payload.events.parent) parent
GROUP BY
1,
2
ORDER BY
3 DESC
Row | f1_ | f2_ | f0_ | |
---|---|---|---|---|
1 | addonsManager | install | 1647 | |
2 | addonsManager | update | 1596 | |
3 | devtools.main | tool_timer | 595 | |
4 | addonsManager | disable | 563 | |
5 | addonsManager | enable | 555 | |
6 | devtools.main | exit | 380 | |
7 | devtools.main | enter | 375 | |
8 | devtools.main | close | 325 | |
9 | devtools.main | open | 310 | |
10 | uptake.remotecontent.result | uptake | 284 | |
11 | addonsManager | uninstall | 223 | |
12 | devtools.main | edit_rule | 147 | |
13 | activity_stream | end | 41 | |
14 | devtools.main | execute_js | 38 | |
15 | devtools.main | activate | 19 | |
16 | devtools.main | deactivate | 15 | |
17 | extensions.data | migrateResult | 12 | |
18 | activity_stream | event | 11 | |
19 | devtools.main | object_expanded | 9 | |
20 | devtools.main | pause_on_exceptions | 8 | |
21 | devtools.main | edit_html | 6 | |
22 | devtools.main | sidepanel_changed | 3 | |
23 | devtools.main | filters_changed | 3 | |
24 | addonsManager | sideload_prompt | 2 | |
25 | devtools.main | jump_to_definition | 1 | |
26 | devtools.main | jump_to_source | 1 | |
27 | security.ui.identitypopup | open | 1 | |
28 | security.ui.identitypopup | click | 1 |
cat data/telemetry.event.4.ndjson | jq -cr '.payload.events.parent | select(. != null) | .[] | [.[1], .[2]] | join("|")' | sort | uniq -c | sort -r | sed 's/^[[:space:]]*//g' | awk '{printf "%s|%s\n", $2,$1}'
f1_ | f2_ | f0_ |
---|---|---|
addonsManager | install | 1647 |
addonsManager | update | 1596 |
devtools.main | tool_timer | 595 |
addonsManager | disable | 563 |
addonsManager | enable | 555 |
devtools.main | exit | 380 |
devtools.main | enter | 375 |
devtools.main | close | 325 |
devtools.main | open | 310 |
uptake.remotecontent.result | uptake | 284 |
addonsManager | uninstall | 223 |
devtools.main | edit_rule | 147 |
activity_stream | end | 41 |
devtools.main | execute_js | 38 |
devtools.main | activate | 19 |
devtools.main | deactivate | 15 |
extensions.data | migrateResult | 12 |
activity_stream | event | 11 |
devtools.main | object_expanded | 9 |
devtools.main | pause_on_exceptions | 8 |
devtools.main | edit_html | 6 |
devtools.main | sidepanel_changed | 3 |
devtools.main | filters_changed | 3 |
addonsManager | sideload_prompt | 2 |
security.ui.identitypopup | open | 1 |
security.ui.identitypopup | click | 1 |
devtools.main | jump_to_source | 1 |
devtools.main | jump_to_definition | 1 |
Nested lists are not handled correctly.
results in
This PR fixes this so there is an intermediate layer that can be used for unnesting.
See this gist for the result of the verification script: https://gist.github.com/acmiyaguchi/619b113f0b536480919ecf90a4028036. This lines up with the experience with the third party modules and untrusted modules pings, which are likely the only pings with nested arrays.