m-lab / etl

M-Lab ingestion pipeline
Apache License 2.0
22 stars 7 forks source link

Sandbox dead code #977

Closed gfr10598 closed 3 years ago

gfr10598 commented 3 years ago

Removes dead code in annotations/geo.go

We should also consider whether we can completely remove the functions/ directory


This change is Reviewable

coveralls commented 3 years ago

Pull Request Test Coverage Report for Build 6396


Files with Coverage Reduction New Missed Lines %
active/active.go 4 87.64%
<!-- Total: 4 -->
Totals Coverage Status
Change from base Build 6381: -0.5%
Covered Lines: 3363
Relevant Lines: 5056

💛 - Coveralls
SaiedKazemi commented 3 years ago

@gfr10598 Can you please add the instructions you used to test these changes (both locally and remotely if applicable)?

gfr10598 commented 3 years ago

The sandbox-dead-code branch is mostly succeeding:

Looks like data from paris-traceroute, tcpinfo, ndt5, switch, are all succeeding. Data from web100 ndt (2019), sidestream (2019) and the new traceroute (2021) are consistently failing.

https://cloudlogging.app.goo.gl/N4psyqNc8HQb6k4T7

Problems are like: {Location: "web100_log_entry.connection_spec.local_geolocation.Subdivision2ISOCode"; Message: "no such field."; Reason: "invalid"} on sidestream_20190703

{Location: "web100_log_entry.connection_spec.local_geolocation.Missing"; Message: "no such field."; Reason: "invalid"} on sidestream_20190703

{Location: "web100_log_entry.connection_spec.remote_geolocation.Subdivision2Name"; Message: "no such field."; Reason: "invalid"}

{Location: "destination.geo.Subdivision1ISOCode"; Message: "no such field."; Reason: "invalid"} on traceroute_20210324

{Location: "connection_spec.server_geolocation.subdivision1isocode"; Message: "no such field."; Reason: "invalid"} on ndt_20190810

{Location: "source.geo.Subdivision1ISOCode"; Message: "no such field."; Reason: "invalid"}

gfr10598 commented 3 years ago

Ah - turns out that update-schema only updates the four tables that are working. The others are failing because they were NOT updated. http://console.cloud.google.com/cloud-build/builds;region=global/eb345a03-67f3-4328-8518-c00cd910d118;step=6?project=mlab-sandbox

I will investigate whether we can fix update-schema to update the other tables.

gfr10598 commented 3 years ago

To avoid stalling other PRs, I'm going to re-create the tables that interfere with building master: 2021/03/25 16:25:23 googleapi: Error 409: Already Exists: Table mlab-sandbox:base_tables.tcpinfo, duplicate 2021/03/25 16:25:24 googleapi: Error 409: Already Exists: Table mlab-sandbox:batch.tcpinfo, duplicate 2021/03/25 16:25:26 googleapi: Error 409: Already Exists: Table mlab-sandbox:base_tables.traceroute, duplicate 2021/03/25 16:25:27 googleapi: Error 409: Already Exists: Table mlab-sandbox:batch.traceroute, duplicate

Legacy tables that were updated by etl-schema were not effected. raw_ndt and tmp_ndt were apparently not affected - I guess uuid-annotator gets its schema from an independent source ???

gfr10598 commented 3 years ago

Hmm - since the sync_tables_with_schema.sh script is broken, it likely hasn't been used in ages, and it is unclear whether it will create the same schema we have in staging and prod.

This is relevant if we want to move forward with changing annotation-service schema in the legacy tables, but NOT relevant to rolling back to master, since these tables weren't changed to begin with.

gfr10598 commented 3 years ago

D'oh - I'm a little slow. The sync_tables_with_schema accounts for my new code failing, because those schemas did NOT get updated.

The cmd/update-schema did update the newer tables, so master now won't build successfully, because those tables now have new fields.

gfr10598 commented 3 years ago

Rebased on master, which has other changes that were previously in this PR, but separated out and merged separately.

This now just has the dead code removal, and passes all tests, and runs properly in sandbox.

Not seeking additional review, since all code here has already been approved - no changes made aside from removing commits that were moved to other PRs.