matomo-org / matomo

Empowering People Ethically with the leading open source alternative to Google Analytics that gives you full control over your data. Matomo lets you easily collect data from websites & apps and visualise this data and extract insights. Privacy is built-in. Liberating Web Analytics. Star us on Github? +1. And we love Pull Requests!
https://matomo.org/
GNU General Public License v3.0
19.38k stars 2.6k forks source link

GeoIP2 updater might try to download new file before it exists #18427

Open Findus23 opened 2 years ago

Findus23 commented 2 years ago

reported in https://forum.matomo.org/t/geoip2autoupdater-failed-to-unzip-the-downloaded-file-is-not-a-valid-geolocation-database/43925 (and also found in my cronjob)

Expected Behavior

Whenever a GeoIP2 update job falls at the start of a month it might be possible that the dbip-city-lite-2021-12.mmdb.gz doesn't exist on the servers and the update fails

Current Behavior

ERROR [2021-12-01 02:03:30] 895774 /var/www/matomo/plugins/GeoIp2/GeoIP2AutoUpdater.php(189): GeoIP2AutoUpdater: failed to unzip '/var/www/matomo/tmp/latest/DBIP-City.mmdb.gz.download' after downloading 'https://download.db-ip.com/free/dbip-city-lite-2021-12.mmdb.gz': The downloaded file is not a valid geolocation database. Please re-check the URL or download the file manually. [Query: , CLI mode: 1]

Possible Solution

In case a 404 is returned, Matomo could fetch the previous months file again or reschedule the job for a few hours later.

tassoman commented 1 year ago

This night, our job tried to unzip until 5AM GMT+1. Maybe we can simply skip the monthly job on 2nd of each month? 🤔

PowerKiKi commented 8 months ago

Same here, https://download.db-ip.com/free/dbip-city-lite-2023-11.mmdb.gz returns a 404 at the time of writing. Though it will probably work in a few hours...

Moving the cron to the 2nd of the month sound like a good idea to limit those issue without having data that are too stale.

PowerKiKi commented 8 months ago

I created #21468 as a possible fix for this issue.

sgiehl commented 8 months ago

Reopening, as the PR only adjusted the day the download is tried. We still should aim to implement a proper handling when the download fails.

tassoman commented 8 months ago

There is no proper handling if the file is simply missing. I think an administrator notification in GUI and an error logged (already exists), they are enough.
Fortunately, older GEOIP data don't gets wiped before new data it's downloaded.
So, nowadays, it's not a real problem and just happens rarely for a few hours.

Yannik commented 2 months ago

Same issue tody, it tries to download https://download.db-ip.com/free/dbip-city-lite-2024-05.mmdb.gz (which does not exist) every 15 minute (starting at 2am CET, now it's 11.47 CET) and fails with failed to unzip error.

Yannik commented 2 months ago

Note: this is happening with matomo 5.0.3, which already has the change from #21468.

Excerpt from plugins/GeoIp2/GeoIP2AutoUpdater.php file of the matomo instance:

 73         // created the scheduledtime instance, also, since GeoIP 2 updates are done on tuesdays,
 74         // get new DBs on Wednesday. For db-ip, the databases are updated daily, so it doesn't matter exactly
 75         // when we download a new one.
 76         switch ($schedulePeriodStr) {
 77             case self::SCHEDULE_PERIOD_WEEKLY:
 78                 $schedulePeriod = new Weekly();
 79                 $schedulePeriod->setDay(3);
 80                 break;
 81             case self::SCHEDULE_PERIOD_MONTHLY:
 82             default:
 83                 $schedulePeriod = new Monthly();
 84                 $schedulePeriod->setDay(3);
 85                 break;
 86         }
DXXS commented 2 months ago

I'm getting an error message every hour or so here:

ERROR [2024-05-03 01:05:12] 68789 /var/www/matomo/plugins/GeoIp2/GeoIP2AutoUpdater.php(190): GeoIP2AutoUpdater: failed to unzip '/var/www/matomo/tmp/latest/DBIP-City.mmdb.gz.download' after downloading 'https://download.db-ip.com/free/dbip-city-lite-2024-05.mmdb.gz': The downloaded file is not a valid geolocation database. Please re-check the URL or download the file manually. [Query: , CLI mode: 1] ERROR [2024-05-03 01:05:12] 68789 Scheduler: Error GeoIP2AutoUpdater: failed to unzip '/var/www/matomo/tmp/latest/DBIP-City.mmdb.gz.download' after downloading 'https://download.db-ip.com/free/dbip-city-lite-2024-05.mmdb.gz': The downloaded file is not a valid geolocation database. Please re-check the URL or download the file manually. for task 'Piwik\Plugins\GeoIp2\GeoIP2AutoUpdater.update'

Still using Matomo 4.9.1