mobidata-bw / postgis-gtfs-importer

Imports GTFS data into a PostGIS database, using gtfstidy & gtfs-via-postgres.
European Union Public License 1.2
3 stars 0 forks source link

investigate possible curl-mirror bug (IPL bwgesamt GTFS import) #10

Open derhuerst opened 4 weeks ago

derhuerst commented 4 weeks ago

In >1 IPL deployment, the following error occurred when mirroring and importing the "bwgesamt" GTFS feed:

+ unzip -d /var/gtfs/gtfs /var/gtfs/gtfs.zip
warning [/var/gtfs/gtfs.zip]: 3428352 extra bytes at beginning or within zipfile
(attempting to process anyway)
file #1: bad zipfile offset (local header sig): 3428352
(attempting to re-compensate)
inflating: /var/gtfs/gtfs/agency.txt
error: not enough memory for bomb detection

A bug in curl-mirror seems likely that

However, it could also be that the "bwgesamt" zip file isn't getting updated atomically, in which case a curl-mirror run "caught" the update process right while it had been happening. @NVBWSeifert can you rule out this cause?

NVBWSeifert commented 3 weeks ago

Which bwgesamt.zip do you use url beginning with nvbw.de/... or gtfs....?

NVBWSeifert commented 3 weeks ago

i made the download from both sources on ubuntu vm with curl and then unzipped it. I got no warnings or errors during expanding. On the ubuntu vm, i have UnZip 6.00 of 20 April 2009, by Debian. Original by Info-ZIP. Details to the zip-files from gtfs... 1514273092 Bytes sha256sum Hash: 6a69492a95da568caf545f631f1abf6310c771b98049ac18941c1fa9dd9a8e8b bwgesamt.zip

from nvbw.de 610190170 Bytes sha245sum Hash: 98327cd4867630494b5a4c0f7a9ec02931f37ba5adc57cc109536701436f39f5

I don't know, why there is so much difference in file size. I will generate both file new today, please make a new download after 14:00h at 30.10.2024

NVBWSeifert commented 3 weeks ago

I have generated a new bwgesamt.zip File, accessible with the standard url https://www.nvbw.de/fileadmin/user_upload/service/open_data/fahrplandaten_mit_liniennetz/bwgesamt.zip. Note, that for every zip-files, a corresponding MD5-Hashfile will be generated. It has a suffix .md5, so for bwgesamt.zip, the url is https://www.nvbw.de/fileadmin/user_upload/service/open_data/fahrplandaten_mit_liniennetz/bwgesamt.zip.md5

Please check, if the new zip-files is usable.

derhuerst commented 6 days ago

We're using that one:

I have generated a new bwgesamt.zip File, accessible with the standard url https://www.nvbw.de/fileadmin/user_upload/service/open_data/fahrplandaten_mit_liniennetz/bwgesamt.zip.

Although it's rather unlikely that this is the cause of the bug described here, but I'd like to rule out possible causes: Do you upload that file atomically (as in uploaded as e.g. bwgesamt.zip.tmp, then rename to bwgesamt.zip)? Do you usually upload during Europe/Berlin office hours? Or could it be that the download scripts happens to "catch" and in-progress upload?

Note, that for every zip-files, a corresponding MD5-Hashfile will be generated. It has a suffix .md5 […]

Thanks, that will help identify the actual bug in the next runs.

I have added debug output to ipl-orchestrations' patched/custom download script in https://github.com/mobidata-bw/ipl-orchestration/commit/65d42196a1fc7f42a713d209dcee77112f4e7279.