Closed stephen-soltesz closed 2 years ago
While verifying the v2 parser tables in staging, I found that there is a 1-3% difference in the number of rows.
WITH v2 AS (
SELECT date, "v2" as label, COUNTIF(raw.C2S IS NOT NULL) as upload, COUNTIF(raw.S2C IS NOT NULL) as download
from `mlab-staging.ndt.ndt5`
WHERE date > date('2021-09-01')
group by date
order by date
), v1 AS (
SELECT DATE(_PARTITIONTIME) as date, "v1" as label, COUNTIF(result.C2S IS NOT NULL) as upload, COUNTIF(result.S2C IS NOT NULL) as download
from `mlab-staging.base_tables.ndt5`
WHERE _PARTITIONTIME > TIMESTAMP('2021-09-01')
group by date
order by date
)
SELECT * FROM v2
UNION ALL
SELECT * FROM v1
It appears that the missing rows are due to empty UUIDs in the raw results, which the v2 parser skips b/c these are not valid measurements.
SELECT *
from `mlab-staging.base_tables.ndt5`
WHERE _PARTITIONTIME > TIMESTAMP('2021-09-01')
AND result.S2C.UUID = ""
AND _PARTITIONTIME = TIMESTAMP("2021-11-21")
LIMIT 10
Ultimately these rows should be in the raw table, but they need not block this initial deployment b/c these rows will not be present in the unified views.
Deletion of v1 portions part of: https://github.com/m-lab/etl/issues/1074
ndt_raw.ndt5_legacy
view (or base_tables.ndt5)ndt.ndt5
view - this will be replaced by a materialized join by v2 gardener