shlinkio / shlink

The definitive self-hosted URL shortener
https://shlink.io
MIT License
3.37k stars 277 forks source link

Improve GeoLite2 file downloads when using RoadRunner #2124

Open acelaya opened 6 months ago

acelaya commented 6 months ago

An issue was recently reported, which was causing a GeoLite2 db file download attempt for every visit to Shlink https://github.com/shlinkio/shlink/issues/2114. The root cause is not determined, and it eventually went away.

A similar issue was fixed some time ago, which caused the same result, but because of a bug in Shlink https://github.com/shlinkio/shlink/issues/2021

These issues are highlighting the fact that current approach to automatically download/update the GeoLite2 db file is a bit brittle, and would be good to revisit it.

Current approach and context

When Shlink started to use GeoLite2, it initially provided a command line tool that checks if the database is up to date, and tries to download it otherwise. It was up to users to schedule the execution of this command as they see fit.

This is still the recommended approach for those serving Shlink with a classic web server (nginx/apache + php-fpm, or similar).

For convenience, and due to the existence of background jobs when Shlink started to support swoole/openswoole, and later RoadRunner, Shlink tried to provide a mechanism to automatically check if the GeoLite2 db needs to be updated, every time a visit happens, and do it if the file's build date metadata tells it's old enough.

This presents some problems though. If the download fails for whatever reason (a bug in Shlink, incorrect write permissions, download timeout, error while extracting the file, etc.), when existing db is too old, Shlink will try to download a new file for every visit, which can lead to a lot of download attempts.

This is even worst with recent MaxMind API limit changes, which only allow 30 daily downloads for one API key, leading to email notifications and loggs getting fludded with errors, when Shlink has reached that limit.

Ideal scenario

In an ideal world, Shlink would try to update the GeoLite2 db only every N days, but not based on the file metadata, but on a fixed time schedule relative to when was the last attempt. If an error occurs, Shlink should re-schedule another attempt a bit later, with a maximum amount of attempts per day to try to avoid API limits.

This is tricky though, as RoadRunner's jobs system doesn't immediately provide this capability, so it would require some custom implementation.

RoadRunner's job queues docs https://docs.roadrunner.dev/queues-and-jobs/overview-queues

fmunim commented 5 months ago

Is there a way to manually download the database and place it somewhere that will be used to use it?

MattBlissett commented 4 days ago

If using the official Docker container, a workaround for this issue is to mount the GeoIP database from the host, and use the host's systemd timer / cron to update the database:

docker run --restart always --name shlink -p 8080:8080 [environment] -v /var/lib/GeoIP/GeoLite2-City.mmdb:/etc/shlink/data/GeoLite2-City.mmdb shlinkio/shlink:stable

As far as I can see, Shlink still geolocates visits without the licence key, so long as the database is present.