Open heffergm opened 8 years ago
Since this ticket we've completely revamped the Geonames importer. This looks like a transient issue related to an invalid download file. If it happens again I will take another look.
Hello two years later, this issue has happened again and is an issue in the unzip utility we use. Fortunately there is a replacement
This appears to not necessarily be an issue with the unzip NPM package, but something about the files that are downloaded to disk that causes them to be slightly corrupt in a way the unzip
command line program handles fine, but the unzip
NPM package does not. It's unclear if it's our downloader causing the corruption or if the Geonames server distributes the files in this corrupted state.
This issue is still occuring in our builds, despite the attempts in #154 and https://github.com/pelias/geonames/pull/171 to solve or work around it. This has been happening periodically since the very creation of this repository (it is in fact a dupe of the very first issue in the repo).
We need to consider using an alternate unzip method, such as a commandline unzip that is more robust, or some sort of other solution.
Does anyone know if there a work-around for this? I'm encountering this issue currently, and I'm unable to complete the import.
Hey @asdfasdafas, We have somewhat of a workaround, but its not great. Since the problem is (we think) inherent in the zipfile as published by Geonames, we get around it for Mapzen Search by caching old, valid zipfiles.
One possible alternative workaround would be to change our code to avoid using the node.js zip library, and use a standard commandline unzip. This would require some reorganizing of the code in this importer, but if you were interested in taking a look at it I'd be happy to help point you in the right direction. We would gladly accept a PR that does that :)
Ah I probably wouldn't be much help on the node.js code, but would you happen to know where I could download copies of the known-good geonames files?
No worries. This is the one we have cached for Mapzen Search: https://s3.amazonaws.com/pelias-data/geonames/allCountries.zip
Its modification time is Nov 18, 2017 7:05:50 PM GMT-0500
, so its not TOO old.
An update here: as it turns out, there is no correct way to stream a zip
file without loading it into memory. This makes sense, as you can't pipe to or from unzip
on the command line.
We have two options, switch to using a library like yazul which implements a non-streaming API for reading zip files, or extract zip files after download to expose the underlying text file, which IS stream-able.
My vote is for the second approach, since it would have the added benefit of removing code, whereas adjusting our existing code to use yazul may be a bit of tedious work.
In either case, https://github.com/pelias/geonames/issues/297 is effectively a prerequisite.
Update: a possible workaround here is to download the broken Geonames zip file, extract the data with unzip
, and then re-compress it with zip
. This seems to create archives that the importer can successfully read.
This occurs sporadically.