m-lab / etl

M-Lab ingestion pipeline
Apache License 2.0
22 stars 7 forks source link

Remove annotation from Gardener 2.0 K8S parsers #992

Open gfr10598 opened 3 years ago

gfr10598 commented 3 years ago

New parsers should NOT annotate records, as they are annotated by joins in BQ. The K8S annotation-service should be shut down, and null-annotator should be used for 2.0 parsing tasks.

This would also make the annotator metrics easier to interpret, as they would be legacy pipeline only.

stephen-soltesz commented 3 years ago

@gfr10598 can you clarify? We discussed part of this last week. Is this more about cleaning up the annotation metrics? The 2.0 parsers (ndt7 & annotation) do not use the annotation service today. Please update the issue title/description to clarify.

stephen-soltesz commented 3 years ago

Working on the local writer for ETL, I discovered that ndt7 (and probably annotation) attempt to contact the annotation-service unnecessarily. The NewSinkParser for ndt7 and annotation services receives a real api.Annotator.

https://github.com/m-lab/etl/blob/master/parser/parser.go#L50

Ideally, this would be a 'noop' for these datatypes and future, migrated types.