jsdelivr / globalping

A global network of probes to run network tests like ping, traceroute and DNS resolve
https://globalping.io
263 stars 33 forks source link

IP Geo v2 #69

Closed jimaek closed 1 year ago

jimaek commented 2 years ago

Even with 3 IP data sources I get lots of wrong locations with datacenter IPs. And now while I control all of them I can either remove the wrong probes completely or make a request to maxmind and ipinfo to update their data. But soon we will have 0 control over them and if we detect an Australian IP as American it will end up heavily influencing the results with people not understanding why their USA to USA test is so slow.

The problem is that even if a DB gets an IP's geo correctly we can't know it without a human review. So the current logic of 2vs1 is still the most optimal one.

So I am thinking how we could complement the existing system. Some ideas:

  1. What Patryk said. Run some latency test. e.g. if our DB logic says the IP is in Dallas then have the probe ping an IP address we 100% know is located in Dallas (e.g. aws endpoints). Then if the latency is higher than 5-10ms that means the DB was wrong. But this has lots of potential pitfalls. What if we dont have a static endpoint anywhere close to the probe? How do we decide the exact number of ms as the threshold? And do we do after the test fails? Sounds too complicated and unreliable.
  2. Manual override rules. It's more of a hack than a solution. But basically a config file in Github where we could write IPs or IP ranges and the exact location as we see fit. This would guarantee accuracy but its not scalable. Also we could make the corrections only after someone reports a mistake, so if nobody reports anything we won't be able to fix anything. Also it doesn't work long term. A correct fix now could be wrong in 6 months when the IPs get moved to a different datacenter.
  3. User data. We could allow the users to pass env vars that correct their reported IP GEO but then it becomes a question of trust. It makes us open to abuse and malicious/troll activity.

To me none of the above ideas are great. So lets keep this issue open until we can come up with something better

patrykcieszkowski commented 2 years ago
  1. We don't necessarily need an IP address nearby, but in the general area - for instance, the city or state. We could extend the delay to 20-30ms just in case. We also could run tests against commercial IPs - Amazon/Google DCs for instance.
  2. We could combine this with first solution
  3. True. But on the other hand, we already trust users with probe version they report
jimaek commented 2 years ago
  1. Well yes thats what I meant. But that's not doable. E.g. if someone connects from Asia, South America, Africa the chances of us having a reliable testing endpoint nearby are almost 0. And then how do you decide that 5-10ms is enough?
  2. Yeah its combinable with anything but I am afraid that long-term it will do more harm than good
  3. Yes, but version is not that critical and gives the troll nothing. Faking locations would negatively affect the service in a serious way. So more motivation to do it.
jimaek commented 2 years ago

Quick note: We need to replace digital element with something else, potentially db-ip. In 90% of issues the fault lies at DE.

MartinKolarik commented 2 years ago

Maybe instead of measuring latency, we could rely on routing information. Make a request to a Cloudflare-hosted website and get the location it hit from the headers. Resolve the probe and POP locations to coordinates (there are APIs, we use one here), then check it's among the top N (5?) closest CF locations (list is easy to get and distance based on coordinates is a simple calculation). If the list of CF locations is based on https://www.cloudflarestatus.com/ it can also consider that some are temporarily rerouted.

jimaek commented 2 years ago

Cloudflare is using maxmind as far as I know on their geoip endpoint, so its nothing special. If you mean using the location of the POP thats even worse because their routing is all over the place and randomly changes. Poland was going to Frankfurt for months without any status updates until I manually reported it. And it happens all the time with all regions

MartinKolarik commented 2 years ago

Yes, I meant routing directly. Putting that idea aside, what you want is probably this: https://arxiv.org/pdf/2004.07836.pdf (not necessarily this exact algorithm but the idea). Assuming we have maybe 200 static points with known correct locations, we can have them pinged, do some math, and have a probably good enough estimate of the new probe's location.

The 200 (not sure how many would be really needed for good results) static points can not only be public cloud services but also probes in our network. E.g. the API could have a list of those we run ourselves and have 100% correct location. Then those probes could be used to locate the other ones.

jimaek commented 2 years ago

I thought of that, I even want to make a public tool that uses our network for that. But it will only work in Europe and USA. Good luck guessing the location in Africa when all we have is 10 probes in South Africa :)

Maybe in a few years when we have thousands of probes everywhere

MartinKolarik commented 2 years ago

It shouldn't matter that much. It's true that the closer the probes, the better the results but it should be good enough for this purpose even from larger distance.

jimaek commented 2 years ago

Why not? If someone is from Nigeria and we have Probes in Egypt and South Africa there is no way we can guess the country, not even talking about the city which is also very important info for us. And what do we do if the algorithm says Nigeria but geoip says Cameroon?

Also in most cases country is correctly provided by the DBs, maybe not all 3 but at least 1 will be correct. The issues always come when we're talking about cities.

MartinKolarik commented 2 years ago

I suppose your expectations are very different from what you outlined in the issue. The first post here talks about mixed up continents so I was aiming to identify a continent and roughly a country. For two neighboring countries, it won't impact the tests much anyway if we identify them incorrectly.

jimaek commented 2 years ago

That was an example. I guess this solution could be used as a failover when 3 DBs have all different results and then choose the DB that agrees with this test. But at that point maybe it would make sense to just block that probe. Without accurate geo information many tests could become useless. e.g. If I am debugging the routing of my CDN in Brazil, I dont want traceroutes from Mexico being reported as Brazil, it would only confuse the user

jimaek commented 2 years ago

Example list of probes that are all detected as Los Angeles while ipinfo seems more accurate https://gist.github.com/jimaek/b3fcd57908cb15272dd2a375a4872f1f

But even in that case its wrong in many cases.

jimaek commented 2 years ago

Another idea: Traceroute hops based geoip

  1. The API or a micro-service will run a traceroute towards an IP. In our case a probe we dont know where is located
  2. In most cases the traceroute hostnames contain airport codes of where each router is located.
  3. By parsing the last 3-4 hostnames and looking for instances of airport codes in them we could in theory reliably guess the location of the probe

Example

root@ansible2:~# tracepath 171.22.117.64
 1?: [LOCALHOST]                      pmtu 1500
 1:  _gateway                                              0.250ms
 1:  _gateway                                              0.517ms
 2:  10.193.33.129                                         0.936ms
 3:  no reply
 4:  10.193.0.4                                            0.915ms
 5:  ???                                                   0.987ms
 6:  ???                                                   1.059ms
 7:  ae59.bar4.Warsaw1.Level3.net                          1.042ms asymm  8
 8:  ae2.3601.edge2.Phoenix1.level3.net                  154.058ms asymm 18
 9:  xe5-2-5.bcr2.phx1.us.bb.symantec.net                153.391ms asymm 18
10:  border5.ae2-bbnet2.phx010.pnap.net                  151.237ms asymm 18
11:  dedipath-62.edge1.phx.pnap.net                      152.460ms asymm 18
12:  69.25.116.199                                       153.214ms asymm 14
13:  171.22.117.64                                       151.653ms reached

phx = Phoenix, USA

tracepath hetzner.com
 1?: [LOCALHOST]                        0.044ms pmtu 1500
 1:  _gateway                                              0.956ms
 1:  _gateway                                              0.515ms
 2:  no reply
 3:  no reply
 4:  2001:bc8:1c00:1::e                                    2.588ms
 5:  2001:bc8:1c00:1::6                                    1.377ms
 6:  ae62.bar3.Warsaw1.Level3.net                          0.957ms asymm  7
 7:  no reply
 8:  AS33891-NET.edge3.Berlin1.Level3.net                 11.210ms
 9:  ae8-2080.nbg20.core-backbone.com                     21.959ms asymm 12
10:  ae2-2015.nbg60.core-backbone.com                     24.023ms asymm 12
11:  ae1-2014.nbg40.core-backbone.com                     24.395ms asymm 12
12:  2a01:4a0:1338:1ae::2                                 22.455ms asymm  8
13:  core11.nbg1.hetzner.com                              26.577ms asymm  9
14:  ex9k2.dc1.nbg1.hetzner.com                           24.246ms asymm 10
15:  no reply
16:  no reply
17:  no reply

nbg1 contains nbg which is the Nuremberg airport and correct location for the probe

But there are also edge-cases where the airport is a small town and in theory we need to figure out the closest major city. e.g.

 tracepath 91.196.223.248
 1?: [LOCALHOST]                      pmtu 1500
 1:  _gateway                                              0.681ms
 1:  _gateway                                              0.503ms
 2:  10.193.33.129                                         0.688ms
 3:  no reply
 4:  10.193.0.2                                            1.101ms
 5:  ???                                                   0.889ms
 6:  ???                                                   0.994ms
 7:  no reply
 8:  be2486.ccr21.waw01.atlas.cogentco.com                 1.468ms
 9:  be2484.ccr42.ham01.atlas.cogentco.com                13.799ms
10:  be2815.ccr41.ams03.atlas.cogentco.com                21.241ms
11:  be12488.ccr42.lon13.atlas.cogentco.com              112.421ms asymm 17
12:  be2490.ccr42.jfk02.atlas.cogentco.com               116.660ms asymm 17
13:  be2889.ccr21.cle04.atlas.cogentco.com               116.054ms asymm 15
14:  be2718.ccr42.ord01.atlas.cogentco.com               114.562ms
15:  be2831.ccr21.mci01.atlas.cogentco.com               130.559ms asymm 16
16:  be3035.ccr21.den01.atlas.cogentco.com               141.534ms asymm 17
17:  be3038.ccr32.slc01.atlas.cogentco.com               145.863ms
18:  be2085.ccr21.sea02.atlas.cogentco.com               167.937ms
19:  be2895.rcr21.sea03.atlas.cogentco.com               168.930ms
20:  Internap-Network-Services.demarc.cogentco.com       170.209ms
21:  border2.ae2-bbnet2.sef.pnap.net                     169.585ms asymm 22
22:  dedipath-64.edge2.sef003.pnap.net                   168.898ms asymm 23
23:  69.25.117.196                                       168.671ms asymm 24
24:  91.196.223.248                                      168.063ms reached

21 and 22 contain sef which is the Sebring airport in Florida. We could leave it as that maybe, but I think it would make more sense to assign the probe to Tampa or Orlando

Similar project https://www.caida.org/catalog/papers/2021_learning_extract_geographic_information/learning_extract_geographic_information.pdf