kiwix / container-images

8 stars 3 forks source link

Mirrorbrain is not reporting IP location appropriately #251

Open benoit74 opened 8 months ago

benoit74 commented 8 months ago

Mirrorbrain GeoIP database is based on MaxMind GeoIP (Legacy, deprecated since May 2022).

This database is now so outdated that IPs are beginning to not be assigned to correct location anymore.

Some cases from the IP I have access to:

List of best mirrors for IP address 45.83.230.1, located in an unknown country.
List of best mirrors for IP address 104.28.211.188, located in United States (US).
List of best mirrors for IP address 135.181.181.97, located in Canada (CA).

While in fact we should have:

Impact is obviously significant for the two last cases (but I agree they may be a bit corner case since servers IPs which have probably more chance to be redistributed worldwide compared to end-users IP ranges).

kelson42 commented 8 months ago

@benoit74 How complex would it be to move to geoip2 database?

benoit74 commented 8 months ago

@kelson42 work is already mostly done, as documented in the draft PR.

kelson42 commented 1 month ago

@benoit74 @rgaudin I wonder how this issue will impact our mirrors-qa project?

rgaudin commented 1 month ago

@benoit74 @rgaudin I wonder how this issue will impact our mirrors-qa project?

It won't, as mirros-qa is not using mirrorbrain.

kelson42 commented 1 month ago

Let me be preciser: if geoip significantly fails to identify properly client location, then i wonder to which extend if will bias the results.

rgaudin commented 1 month ago

I understand that this issue arose because mirrorbrain (ours at least) is using an out of date database.

There will be zero dependency to mirrorbrain in mirrors-qa. We'll just request the list of mirrors from mirrors.html and we could get that from any mirror.

mirrors-qa will need geoip though (not entirely mandatory but we decided to) but will use an up-to-date DB or a service so no link to MB.

Eventually, this mirrorbrain-specific issue will need to be fixed as there is no point in doing mirrors-qa to optimize geo-allocation if the load-balancer is not able to guess locations from IPs… That said, @benoit74 mentions that the issue is most likely limited to server IPs so the actual impact may be limited.