Closed jimaek closed 1 year ago
20-30ms
just in case. We also could run tests against commercial IPs - Amazon/Google DCs for instance.Quick note: We need to replace digital element with something else, potentially db-ip. In 90% of issues the fault lies at DE.
Maybe instead of measuring latency, we could rely on routing information. Make a request to a Cloudflare-hosted website and get the location it hit from the headers. Resolve the probe and POP locations to coordinates (there are APIs, we use one here), then check it's among the top N (5?) closest CF locations (list is easy to get and distance based on coordinates is a simple calculation). If the list of CF locations is based on https://www.cloudflarestatus.com/ it can also consider that some are temporarily rerouted.
Cloudflare is using maxmind as far as I know on their geoip endpoint, so its nothing special. If you mean using the location of the POP thats even worse because their routing is all over the place and randomly changes. Poland was going to Frankfurt for months without any status updates until I manually reported it. And it happens all the time with all regions
Yes, I meant routing directly. Putting that idea aside, what you want is probably this: https://arxiv.org/pdf/2004.07836.pdf (not necessarily this exact algorithm but the idea). Assuming we have maybe 200 static points with known correct locations, we can have them pinged, do some math, and have a probably good enough estimate of the new probe's location.
The 200 (not sure how many would be really needed for good results) static points can not only be public cloud services but also probes in our network. E.g. the API could have a list of those we run ourselves and have 100% correct location. Then those probes could be used to locate the other ones.
I thought of that, I even want to make a public tool that uses our network for that. But it will only work in Europe and USA. Good luck guessing the location in Africa when all we have is 10 probes in South Africa :)
Maybe in a few years when we have thousands of probes everywhere
It shouldn't matter that much. It's true that the closer the probes, the better the results but it should be good enough for this purpose even from larger distance.
Why not? If someone is from Nigeria and we have Probes in Egypt and South Africa there is no way we can guess the country, not even talking about the city which is also very important info for us. And what do we do if the algorithm says Nigeria but geoip says Cameroon?
Also in most cases country is correctly provided by the DBs, maybe not all 3 but at least 1 will be correct. The issues always come when we're talking about cities.
I suppose your expectations are very different from what you outlined in the issue. The first post here talks about mixed up continents so I was aiming to identify a continent and roughly a country. For two neighboring countries, it won't impact the tests much anyway if we identify them incorrectly.
That was an example. I guess this solution could be used as a failover when 3 DBs have all different results and then choose the DB that agrees with this test. But at that point maybe it would make sense to just block that probe. Without accurate geo information many tests could become useless. e.g. If I am debugging the routing of my CDN in Brazil, I dont want traceroutes from Mexico being reported as Brazil, it would only confuse the user
Example list of probes that are all detected as Los Angeles while ipinfo seems more accurate https://gist.github.com/jimaek/b3fcd57908cb15272dd2a375a4872f1f
But even in that case its wrong in many cases.
Another idea: Traceroute hops based geoip
Example
root@ansible2:~# tracepath 171.22.117.64
1?: [LOCALHOST] pmtu 1500
1: _gateway 0.250ms
1: _gateway 0.517ms
2: 10.193.33.129 0.936ms
3: no reply
4: 10.193.0.4 0.915ms
5: ??? 0.987ms
6: ??? 1.059ms
7: ae59.bar4.Warsaw1.Level3.net 1.042ms asymm 8
8: ae2.3601.edge2.Phoenix1.level3.net 154.058ms asymm 18
9: xe5-2-5.bcr2.phx1.us.bb.symantec.net 153.391ms asymm 18
10: border5.ae2-bbnet2.phx010.pnap.net 151.237ms asymm 18
11: dedipath-62.edge1.phx.pnap.net 152.460ms asymm 18
12: 69.25.116.199 153.214ms asymm 14
13: 171.22.117.64 151.653ms reached
phx = Phoenix, USA
tracepath hetzner.com
1?: [LOCALHOST] 0.044ms pmtu 1500
1: _gateway 0.956ms
1: _gateway 0.515ms
2: no reply
3: no reply
4: 2001:bc8:1c00:1::e 2.588ms
5: 2001:bc8:1c00:1::6 1.377ms
6: ae62.bar3.Warsaw1.Level3.net 0.957ms asymm 7
7: no reply
8: AS33891-NET.edge3.Berlin1.Level3.net 11.210ms
9: ae8-2080.nbg20.core-backbone.com 21.959ms asymm 12
10: ae2-2015.nbg60.core-backbone.com 24.023ms asymm 12
11: ae1-2014.nbg40.core-backbone.com 24.395ms asymm 12
12: 2a01:4a0:1338:1ae::2 22.455ms asymm 8
13: core11.nbg1.hetzner.com 26.577ms asymm 9
14: ex9k2.dc1.nbg1.hetzner.com 24.246ms asymm 10
15: no reply
16: no reply
17: no reply
nbg1 contains nbg which is the Nuremberg airport and correct location for the probe
But there are also edge-cases where the airport is a small town and in theory we need to figure out the closest major city. e.g.
tracepath 91.196.223.248
1?: [LOCALHOST] pmtu 1500
1: _gateway 0.681ms
1: _gateway 0.503ms
2: 10.193.33.129 0.688ms
3: no reply
4: 10.193.0.2 1.101ms
5: ??? 0.889ms
6: ??? 0.994ms
7: no reply
8: be2486.ccr21.waw01.atlas.cogentco.com 1.468ms
9: be2484.ccr42.ham01.atlas.cogentco.com 13.799ms
10: be2815.ccr41.ams03.atlas.cogentco.com 21.241ms
11: be12488.ccr42.lon13.atlas.cogentco.com 112.421ms asymm 17
12: be2490.ccr42.jfk02.atlas.cogentco.com 116.660ms asymm 17
13: be2889.ccr21.cle04.atlas.cogentco.com 116.054ms asymm 15
14: be2718.ccr42.ord01.atlas.cogentco.com 114.562ms
15: be2831.ccr21.mci01.atlas.cogentco.com 130.559ms asymm 16
16: be3035.ccr21.den01.atlas.cogentco.com 141.534ms asymm 17
17: be3038.ccr32.slc01.atlas.cogentco.com 145.863ms
18: be2085.ccr21.sea02.atlas.cogentco.com 167.937ms
19: be2895.rcr21.sea03.atlas.cogentco.com 168.930ms
20: Internap-Network-Services.demarc.cogentco.com 170.209ms
21: border2.ae2-bbnet2.sef.pnap.net 169.585ms asymm 22
22: dedipath-64.edge2.sef003.pnap.net 168.898ms asymm 23
23: 69.25.117.196 168.671ms asymm 24
24: 91.196.223.248 168.063ms reached
21 and 22 contain sef
which is the Sebring airport in Florida. We could leave it as that maybe, but I think it would make more sense to assign the probe to Tampa or Orlando
Similar project https://www.caida.org/catalog/papers/2021_learning_extract_geographic_information/learning_extract_geographic_information.pdf
Even with 3 IP data sources I get lots of wrong locations with datacenter IPs. And now while I control all of them I can either remove the wrong probes completely or make a request to maxmind and ipinfo to update their data. But soon we will have 0 control over them and if we detect an Australian IP as American it will end up heavily influencing the results with people not understanding why their USA to USA test is so slow.
The problem is that even if a DB gets an IP's geo correctly we can't know it without a human review. So the current logic of 2vs1 is still the most optimal one.
So I am thinking how we could complement the existing system. Some ideas:
To me none of the above ideas are great. So lets keep this issue open until we can come up with something better