steamclock / internetmap

The Cogeco Peer 1 Map of the Internet apps for iOS and Android.
MIT License
25 stars 14 forks source link

Update ASN data for 2019 #593

Open apike opened 5 years ago

apike commented 5 years ago

We should try to do this yearly.

ssawchenko commented 5 years ago

Summary Our data pipeline has relied on 3 sources of data:

  1. AS "link" data (http://data.caida.org/datasets/topology/ark/ipv4/as-links/): How AS nodes relate to each other (how they are linked)
  2. Location data (http://dev.maxmind.com/geoip/legacy/geolite): Roughly where an AS node is located based on the IP ranges assigned to it.
  3. Taxonomy data (http://griley.ece.gatech.edu/MANIACS/as_taxonomy/): Provides classifications for AS nodes (ie. large, small, university). This gives the nodes their "colour".

Current Status of data sources

  1. OK - Still appears to be updated and in expected format.
  2. WORK REQUIRED - Legacy DBs have been removed from site; new DBs claim to be less precise with locations and are in a different format than previous years. Also now broken into ipv4 and ipv6 data sets.
  3. OLD (not sure where to get new data from) - Has not been updated since 2006 (was noted in original pass of data pipeline previously).

This means that some work will need to be done to determine how we need to change our python scripts to pull in the required data from the new DB formats to give us useable locations.

ssawchenko commented 5 years ago

We also may be able to look into https://github.com/steamclock/internetmap/issues/15 since it appears as if we may have access to some ipv6 location data sets.

With that in mind, the update for this issue will focus on ivp4 data.

ssawchenko commented 5 years ago

So I’ve spent the day converting the python scripts we had previously for scraping the location data for IP ranges and associating them with ASN nodes. On the surface my refactor seems to have worked, and I am now creating some scripts to compare last years data to this year, to see if there are any glaring differences. So far my rudimentary tests show:

ASNs processed: 56561 Existing ASNs: 44982 Location Changed: 30829 Location Unchanged: 14153 New ASNs: 11579 Removed ASNs: 4757

11k new seemed large so I started digging around in the data to make sure that at least the Cogeco Peer 1 data was seemingly available and correct and of the 4 nodes we had from last year, 3 seemed mostly ok with minor lat/lng changes:

Old: 13768 : [‘Cogeco Peer 1’, 33.748, -84.3858] New: 13768 : [‘Cogeco Peer 1’, 30.1459, -81.5739]

Old: 21548 : [‘Cogeco Peer 1’, 45.5049, -73.7142] New: 21548 : [‘Cogeco Peer 1’, 45.5302, -73.5831]

Old: 23498 : [‘Cogeco Peer 1’, 43.6564, -79.386] New: 23498 : [‘Cogeco Peer 1’, 43.6655, -79.4204]

However one has apparently been removed from the underlaying data set:

ALERT: New Cogeco ASN removed -> 30370 : [‘Cogeco Peer 1’, 43.7301, -79.3935]

Follow up: Trying to determine if 30370 is actually still a valid ASN or not. Some ASN lookup tools report it, and others do not. We may want to check with the client to see if this ASN is still valid.

Ultra Tools shows record: https://www.ultratools.com/tools/asnInfoResult?domainName=AS30370&as_sfid=AAAAAAVM6i_UtcOTorGwtpoV7Nt7EkTooHhNgCihivCFftKdKaKj8LwCwF2QI7ukTzYO4totvXTlig9WVJCFTsQedxstWUMDiwPO0OdpSnvFt-s4USQ-Q8p9Or-MKz1dzadnLRk%3D&as_fid=da3d3b767435b4541f5dea08f0fb43af7c89d568

MXToolbox does not have record: https://mxtoolbox.com/SuperTool.aspx?action=asn%3a30370&run=toolpage