osmcode / pyosmium

Python bindings for libosmium
https://osmcode.org/pyosmium
BSD 2-Clause "Simplified" License
318 stars 65 forks source link

Fix state download when getting some network issues #163

Closed jocelynj closed 3 years ago

jocelynj commented 3 years ago

Hi,

I'm sometime getting network issues to download diff/state file in the middle of an update of an extract. When getting a connection timeout on the state download, all the diffs files already downloaded are lost, and I have to restart the whole download.

This patch fixes this issue, by retrying the download of the state file, which means that we are able to update the extract with the diffs already downloaded.

I have also added a test to cover this testcase.

Thanks!

jocelynj commented 3 years ago

By the way, this network error crashes pyosmium with a message that is not easy to understand:

2021-01-30 17:26:42 DEBUG: Downloaded change 4393070. (1048548 kB available in download buffer)
2021-01-30 17:27:14 DEBUG: Error during diff download. Bailing out.
2021-01-30 17:27:45 DEBUG: Loading state info 4393070 failed with: <urlopen error [Errno 110] Connection timed out>
Traceback (most recent call last):
  File "/usr/bin/pyosmium-up-to-date", line 246, in <module>
    ret = update_from_custom_server(url, seq, ts, options)
  File "/usr/bin/pyosmium-up-to-date", line 125, in update_from_custom_server
    outformat=options.outformat)
  File "/usr/lib/python3/dist-packages/osmium/replication/server.py", line 173, in apply_diffs_to_file
    h.set("osmosis_replication_timestamp", info.timestamp.strftime("%Y-%m-%dT%H:%M:%SZ"))
AttributeError: 'NoneType' object has no attribute 'timestamp'
lonvia commented 3 years ago

So it looks like you are not getting an URLError but an OS error from the underlying OS call. I'm not convinced we should retry on all these errors. This might do more harm than good in many cases. Have you tried simply increasing the timeout? There is an option --socket-timeout for that. The current default is at 60sec.

lonvia commented 3 years ago

Rereading the code, you won't really loose downloaded diffs on a network error. pyosmium will apply all the data it has managed to get and report back that it did some work but more is still available. That's not necessarily ideal for updating planets where applying a diff in multiple steps can be expensive. But for your current situation it means that you can just retry as long as pyosium-up-to-date tells you that there is more data (error code 1).

Your particular issue with the last message is that your code run into another network error while trying to save the situation. We need to fix that.

lonvia commented 3 years ago

Fixed the AttributeError in 2ac1c24d0bf15c95aa0dc836cadad27f3fb36074. I would suggest to leave it at that.

jocelynj commented 3 years ago

Thanks for the fix. It looks better than mine, as it will work even if the network is suddenly completely disconnected.

Do you want me to abandon this pull-request?

lonvia commented 3 years ago

Yes, I'll close this, if all works for you now. I'll do another release tomorrow, so that the fixes are available on pip and Debian backports.