m-lab / mlab-vis-pipeline

M-Lab Visualization Dataflow pipelines for transforming ndt.all into the needed aggregation tables in bigtable.
2 stars 4 forks source link

client asn lists need to be thresholded #3

Closed vlandham closed 8 years ago

vlandham commented 8 years ago

currently we have an unusable number of client asns coming for most locations.

Also, there appear to be many isps with no data whatsoever.

Example for naus from the API:

{
"meta": {
"client_asn_name": "Sinister Networks",
"client_asn_number": "AS10255",
"client_continent": "North America",
"client_continent_code": "NA",
"client_country": "United States",
"client_country_code": "US",
"location_key": "naus",
"type": "country"
}
},
{
"meta": {
"client_asn_name": "Meganet Communications",
"client_asn_number": "AS10271",
"client_continent": "North America",
"client_continent_code": "NA",
"client_country": "United States",
"client_country_code": "US",
"location_key": "naus",
"type": "country"
}
},

we see no data values, or last_year_count . not sure why these are in the data.

we should threshold the SQL to only include ASN's with either:

We discussed this with MLab long ago - and it was indicated that this filtering (the top N one specifically was a sound way to approach things

I don't think we need to pick more than one of the above for the list to go down significantly.