osm-search / Nominatim

Open Source search based on OpenStreetMap data
https://nominatim.org
GNU General Public License v3.0
3.2k stars 715 forks source link

Wikipedia importance dump not available (HTTP error 403) #2929

Open FaFre opened 1 year ago

FaFre commented 1 year ago

According to the documentation there should be an Wikipedia importance dump available under the following address: https://nominatim.org/data/wikimedia-importance.sql.gz

However, there seems nothing available. Is there an error with the webserver, or is this due to an outdated documentation? https://nominatim.org/release-docs/latest/admin/Import/#downloading-additional-data

lonvia commented 1 year ago

An inconsiderate user has maxed out traffic allowances on the server. The download will remain disabled until appropriate rate limiting is in place.

FaFre commented 1 year ago

Oh thats very unfortunate. Is there any mirror available?

mtmail commented 1 year ago

@FaFre I have a copy on https://downloads.opencagedata.com/public/wikimedia-importance.sql.gz (not a mirror so don't hardcode in scripts)

lonvia commented 1 year ago

The data is back now. Please check your scripts and restrict the download of extra data to the necessary minimum. The alternative is severe rate limiting on the server and nobody really wants that.

lonvia commented 1 year ago

Things are not better. There is a script circulating which downloads the 300MB wikipedia importance file once every minute using curl. Be advised that curl will be banned from the server by tomorrow.

lonvia commented 1 year ago

On second thought, I'm not really willing to pay for another TB of data traffic over night. curl is now banned from https://nominatim.org/data effective immediately.

If you want to use curl to download data, use curl -A and set a custom user agent that identifies your application. If you are a provider of Nominatim installation scripts make absolutely sure that this user agent can be used to get in contact with you in case you screw up your script.

Furthermore, if you regularly check for updates of any data files, make sure to use the -z option or similar to avoid redownloading when there is no new version.