rustprooflabs / pgosm-flex

PgOSM Flex provides high quality OpenStreetMap datasets in PostGIS (Postgres) using the osm2pgsql Flex output.
MIT License
100 stars 20 forks source link

Local downloads ignored/overwritten #385

Open jmealo opened 4 months ago

jmealo commented 4 months ago

What version of PgOSM Flex are you using?

latest

Docker image

What did you do exactly?

Try to download north-america.

What did you expect to happen?

It to download in less than half an hour.

What did happen instead?

It stalled and barely got past 100mb.

What did you do to try analyzing the problem?

I downloaded the file using aria2c and then started pgosm-flex again. Even though the file was chmod 777 it says "local file not found, downloading file and .md5"

I believe this issue may be if the .md5 isn't found, but the file is, it just blows out the file.

Update: Even with the .md5 and .osm.pbf file in the data directory, chmod 777 it still blows it out.

Either way, pgosm-flex will redownload everything.

jmealo commented 4 months ago

I'm using --force I don't know if that's part of the behavior.

rustprooflabs commented 4 months ago

The current behavior doesn't care about files named -latest and will overwrite without doing any checks. In hindsight, probably not ideal!

To use a manually downloaded file replace latest with yyyy-mm-dd. Instead of us-latest.osm.pbf it'd be us-2024-05-16.osm.pbf and the same with the .md5 file. Then when running docker exec add --pgosm-date 2024-05-16. With the --pgosm-date along with region details it should find your available files and skip the download.

I thought I had this documented but a quick search didn't bring anything up. I'll work on adding that in soon.

jmealo commented 4 months ago
rustprooflabs commented 3 months ago

@jmealo I think that would work. Your suggestion got me thinking about possible side effects and the only negative impact I could think of was "the osm_date column would lie!" Hence, #388. I think this change could be made after that situation is improved.

I'll try to start working on #388 in the near-ish future. If you have time and want to submit a PR to check the file size as suggested against HEAD (assuming via requests), I'd be happy to review/merge.

The geofabrik.py module is the first place that will need adjusting. The logic around what happens when a download isn't needed will also need adjusting. Right now it assumes the correct file is named yyyy-mm-dd and will overwrite the -latest file via this code. Some of that logic looks like I wrote it quickly and never looked back!

There may be some other fidgeting required to make this work properly, but that should be a good path forward.