Migrate ndt5 datatype to use standard columns

stephen-soltesz commented 2 years ago

[x] Update etl-schema and prometheus-support queries to use ndt_raw.ndt5_legacy view (or base_tables.ndt5)
[x] Deploy v2: etl, etl-gardener, replace etl-schema pass-through v2 view for ndt.ndt5
- [x] etl
- [x] gardener
- [x] etl-schemas
- Manual operations for staging and production deployments:
  - [x] delete ndt.ndt5 view - this will be replaced by a materialized join by v2 gardener
  - [x] delete very old versions of ndt5 tables with wrong partitioning types from staging & prod.
[x] Update etl-schema and prometheus-support queries to use new ndt.ndt5 v2 view.
[x] Update website documentation for v2 ndt5 schema
[x] Deploy website documentation

stephen-soltesz commented 2 years ago

While verifying the v2 parser tables in staging, I found that there is a 1-3% difference in the number of rows.

WITH v2 AS (

SELECT date, "v2" as label, COUNTIF(raw.C2S IS NOT NULL) as upload, COUNTIF(raw.S2C IS NOT NULL) as download
from `mlab-staging.ndt.ndt5`
WHERE date > date('2021-09-01')
group by date
order by date

), v1 AS (

SELECT DATE(_PARTITIONTIME) as date, "v1" as label, COUNTIF(result.C2S IS NOT NULL) as upload, COUNTIF(result.S2C IS NOT NULL) as download
from `mlab-staging.base_tables.ndt5`
WHERE _PARTITIONTIME > TIMESTAMP('2021-09-01')
group by date
order by date
)
SELECT * FROM v2
UNION ALL 
SELECT * FROM v1

It appears that the missing rows are due to empty UUIDs in the raw results, which the v2 parser skips b/c these are not valid measurements.

SELECT * 
from `mlab-staging.base_tables.ndt5`
WHERE _PARTITIONTIME > TIMESTAMP('2021-09-01')
AND result.S2C.UUID = ""
AND _PARTITIONTIME = TIMESTAMP("2021-11-21")
LIMIT 10

Ultimately these rows should be in the raw table, but they need not block this initial deployment b/c these rows will not be present in the unified views.

stephen-soltesz commented 2 years ago

Deletion of v1 portions part of: https://github.com/m-lab/etl/issues/1074

m-lab / etl

Migrate ndt5 datatype to use standard columns #1046