m-lab / etl

M-Lab ingestion pipeline
Apache License 2.0
22 stars 7 forks source link

NormalizeIP for traceroute parser #1006

Closed stephen-soltesz closed 3 years ago

stephen-soltesz commented 3 years ago

Between 2019-06-04 and 2020-01-08 the traceroute-caller used the ss utility to poll and discover new connections. This utility reports IPv4-mapped IPv6 addresses for IPv4 addresses, which look like ::ffff:1.2.3.4.

On 2020-01-07, a new version of the traceroute-caller was released to the platform that uses the tcpinfo event socket interface. This mechanism reports normalized IPv4 addresses, which look like 1.2.3.4.

The current parser reports these IPv4-mapped IPv6 addresses literally in BigQuery during this period. ~114M rows include the ::ffff: prefix. (See query below)

This change normalizes the IPv4-mapped IPv6 addresses in the tracereoute parser. To do this generally, the web100.NormalizeIP function is moved to the parser package. And, web100.NormalizeIPv6 is renamed to web100.FixIPv6 to avoid confusion with the NormalizeIP function and reflect that it only applies to web100 generated addresses.

SELECT
  COUNT(*),
  TIMESTAMP_TRUNC(TestTime, DAY) AS day
FROM
  `mlab-oti.base_tables.traceroute`
WHERE
  (Source.IP LIKE '::ffff:%'
    OR Destination.IP LIKE '::ffff:%')
  AND TestTime > TIMESTAMP('2019-01-01')
GROUP BY
  day
ORDER BY
  day

This change is Reviewable

coveralls commented 3 years ago

Pull Request Test Coverage Report for Build 6539


Files with Coverage Reduction New Missed Lines %
active/active.go 4 87.64%
<!-- Total: 4 -->
Totals Coverage Status
Change from base Build 6533: -0.04%
Covered Lines: 3492
Relevant Lines: 5574

💛 - Coveralls