pypsa-meets-earth / earth-osm

Export infrastructure data from OpenStreetMap using Python
https://pypsa-meets-earth.github.io/earth-osm/
MIT License
23 stars 12 forks source link

Add a meaningful error message if download has been incomplete #39

Closed ekatef closed 1 year ago

ekatef commented 1 year ago

@mnm-matin thanks for the amazing package! Usually it works like a charm :)

There is a little usability suggestion. It appears that if data loading has been incomplete, an error message is thrown which is not very meaningful and can easily terrify a user.

Such an issue has been described here. I have also encountered it when running PyPSA-Earth workflow for Japan. It looked like loading has been started, but after a while this enigmatic OSMPBF.Blob' error message has appeared (the full listing is bellow).

It has been fixed with manual loading pbf file, while loading wasn't very fast. So, I assume that the primary reason of the troubles were some issues with the network connections. However, the message made me think first rather about some environment issues than about data ones. I wonder if it would be possible to add a test on loading completeness and add a meaningful error message in case something went wrong. What do you think?

Error message

INFO:snakemake.logging:
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "~/miniconda3/envs/pypsa-earth-upd10/lib/python3.8/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "~/miniconda3/envs/pypsa-earth-upd10/lib/python3.8/multiprocessing/pool.py", line 51, in starmapstar
    return list(itertools.starmap(args[0], args[1]))
  File "~/miniconda3/envs/pypsa-earth-upd10/lib/python3.8/site-packages/earth_osm/extract.py", line 50, in filter_file_block
    entries.ParseFromString(read_blob(file, ofs, header))
  File "~/miniconda3/envs/pypsa-earth-upd10/lib/python3.8/site-packages/earth_osm/osmpbf/file.py", line 54, in read_blob
    blob.ParseFromString(file.read(header.datasize))
google.protobuf.message.DecodeError: Error parsing message with type 'OSMPBF.Blob'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "~/pypsa-earth/.snakemake/scripts/tmp4ub5jw9s.download_osm_data.py", line 112, in <module>
    eo.get_osm_data(
  File "~/miniconda3/envs/pypsa-earth-upd10/lib/python3.8/site-packages/earth_osm/eo.py", line 101, in get_osm_data
    df_feature = process_country(region, primary_name, feature_name, mp, update, data_dir)
  File "~/miniconda3/envs/pypsa-earth-upd10/lib/python3.8/site-packages/earth_osm/eo.py", line 38, in process_country
    primary_dict, feature_dict = get_filtered_data(region, primary_name, feature_name, mp, update, data_dir)
  File "~/miniconda3/envs/pypsa-earth-upd10/lib/python3.8/site-packages/earth_osm/filter.py", line 111, in get_filtered_data
    primary_dict = run_primary_filter(PBF_inputfile, primary_file, primary_name, mp)
  File "~/miniconda3/envs/pypsa-earth-upd10/lib/python3.8/site-packages/earth_osm/filter.py", line 70, in run_primary_filter
    primary_data = filter_pbf(PBF_inputfile, pre_filter, multiprocess)
  File "~/miniconda3/envs/pypsa-earth-upd10/lib/python3.8/site-packages/earth_osm/extract.py", line 93, in filter_pbf
    primary_entries = list(file_query(primary_entry_filter, pre_filter)) #list of named  tuples eg. Node(id,tags, lonlat)
  File "~/miniconda3/envs/pypsa-earth-upd10/lib/python3.8/site-packages/earth_osm/extract.py", line 66, in query_func
    entry_lists = pool.starmap(
  File "~/miniconda3/envs/pypsa-earth-upd10/lib/python3.8/multiprocessing/pool.py", line 372, in starmap
    return self._map_async(func, iterable, starmapstar, chunksize).get()
  File "~/miniconda3/envs/pypsa-earth-upd10/lib/python3.8/multiprocessing/pool.py", line 771, in get
    raise self._value
google.protobuf.message.DecodeError: Error parsing message with type 'OSMPBF.Blob'
mnm-matin commented 1 year ago

Issue #36 Before the extraction begins, the pbf file should be checked for correctness using its associated hashes. And a useful warning message should appear if the file is corrupted, followed by a re-download of the file....

ekatef commented 1 year ago

Nice! Thanks a lot for notifying. Linking this issue to the trouble-shooting post in PyPSA-Earth and closing this issue as a duplicate.

Actually, introducing check of hashes is a very good idea when downloading any dataset... 🙂