m-lab / etl

M-Lab ingestion pipeline
Apache License 2.0
22 stars 7 forks source link

Apply NormalizeIP to sidestream addresses during parsing #1001

Closed stephen-soltesz closed 3 years ago

stephen-soltesz commented 3 years ago

This change fixes the triple-colon formatting in sidestream IPv6 addresses.

Around 2.7B rows in sidestream include ::: in the remote or local IP.

 web100_log_entry.connection_spec.remote_ip LIKE '%:::%' OR web100_log_entry.connection_spec.local_ip LIKE '%:::%'

This change uses a new function NormalizeIP that corrects this formatting. If the IP address is invalid for other reasons, it will be used as-is.


This change is Reviewable

coveralls commented 3 years ago

Pull Request Test Coverage Report for Build 6506


Files with Coverage Reduction New Missed Lines %
active/active.go 2 89.89%
<!-- Total: 2 -->
Totals Coverage Status
Change from base Build 6504: 0.09%
Covered Lines: 3550
Relevant Lines: 5654

💛 - Coveralls
stephen-soltesz commented 3 years ago

Thank you!