pypsa-meets-earth / pypsa-earth

PyPSA-Earth: A flexible Python-based open optimisation model to study energy system futures around the world.
https://pypsa-earth.readthedocs.io/en/latest/
226 stars 177 forks source link

Add a testing procedure to ensure consistency of the supplied inputs #528

Closed ekatef closed 1 year ago

ekatef commented 1 year ago

Describe the feature you'd like to see

It appears that the workflow may currently fail in a not very gracious way due to inconsistencies in the supplied inputs. It may be worth to add detection of such data flaws and communicate it to a user in a clear way and/or to propose a fix.

In particular:

ekatef commented 1 year ago

An example of the error which appears when executing build_renewable_profiles if a provided cutout doesn't match to a modeling domain (defined by the countries list):

INFO:snakemake.logging:[Fri Dec  2 11:16:46 2022]
rule build_renewable_profiles:
    input: networks/base.nc, resources/natura.tiff, data/copernicus/PROBAV_LC100_global_v3.0.1_2019-nrt_Discrete-Classification-map_EPSG-4326.tif, data/gebco/GEBCO_2021_TID.nc, resources/shapes/country_shapes.geojson, resources/shapes/offshore_shapes.geojson, data/hydro_capacities.csv, data/eia_hydro_annual_generation.csv, resources/powerplants.csv, resources/bus_regions/regions_offshore.geojson, cutouts/ear-2013-era5.nc
    output: resources/renewable_profiles/profile_offwind-dc.nc
    log: logs/build_renewable_profile_offwind-dc.log
    jobid: 16
    benchmark: benchmarks/build_renewable_profiles_offwind-dc
    reason: Input files updated by another job: resources/bus_regions/regions_offshore.geojson, resources/powerplants.csv
    wildcards: technology=offwind-dc
    resources: tmpdir=/var/folders/qn/vpndfm21795ckkq89np1ckp40000gn/T, mem_mb=20000
INFO:snakemake.logging:rule build_renewable_profiles:
    input: networks/base.nc, resources/natura.tiff, data/copernicus/PROBAV_LC100_global_v3.0.1_2019-nrt_Discrete-Classification-map_EPSG-4326.tif, data/gebco/GEBCO_2021_TID.nc, resources/shapes/country_shapes.geojson, resources/shapes/offshore_shapes.geojson, data/hydro_capacities.csv, data/eia_hydro_annual_generation.csv, resources/powerplants.csv, resources/bus_regions/regions_offshore.geojson, cutouts/ear-2013-era5.nc
    output: resources/renewable_profiles/profile_offwind-dc.nc
    log: logs/build_renewable_profile_offwind-dc.log
    jobid: 16
    benchmark: benchmarks/build_renewable_profiles_offwind-dc
    reason: Input files updated by another job: resources/bus_regions/regions_offshore.geojson, resources/powerplants.csv
    wildcards: technology=offwind-dc
    resources: tmpdir=/var/folders/qn/vpndfm21795ckkq89np1ckp40000gn/T, mem_mb=20000

INFO:snakemake.logging:
INFO:__main__:correction_factor is set as 0.8855
INFO:__main__:Calculate landuse availabilities...
INFO:__main__:Completed availability calculation (9.25s)
INFO:atlite.convert:Convert and aggregate 'wind'.
[########################################] | 100% Completed | 2.10 s
INFO:atlite.convert:Convert and aggregate 'wind'.
INFO:__main__:Calculating maximal capacity per bus (method 'simple')
INFO:__main__:Calculate average distances.
INFO:__main__:Calculate underwater fraction of connections.
ERROR:shapely.geos:IllegalArgumentException: CGAlgorithmsDD::orientationIndex encountered NaN/Inf numbers
INFO:shapely.geos:Invalid Coordinate at or near point nan nan
Traceback (most recent call last):
  File "~/pypsa-earth/.snakemake/scripts/tmp5lhrw197.build_renewable_profiles.py", line 546, in <module>
    frac = line.intersection(offshore_shape).length / line.length
  File "/Users/ekaterina/opt/miniconda3/envs/pypsa-earth-pdfix/lib/python3.10/site-packages/shapely/geometry/base.py", line 695, in intersection
    return geom_factory(self.impl['intersection'](self, other))
  File "/Users/ekaterina/opt/miniconda3/envs/pypsa-earth-pdfix/lib/python3.10/site-packages/shapely/topology.py", line 73, in __call__
    self._check_topology(err, this, other)
  File "/Users/ekaterina/opt/miniconda3/envs/pypsa-earth-pdfix/lib/python3.10/site-packages/shapely/topology.py", line 38, in _check_topology
    raise TopologicalError(
shapely.errors.TopologicalError: The operation 'GEOSIntersection_r' could not be performed. Likely cause is invalidity of the geometry <shapely.geometry.linestring.LineString object at 0x14566ee30>
[Fri Dec  2 11:17:07 2022]
INFO:snakemake.logging:[Fri Dec  2 11:17:07 2022]
Error in rule build_renewable_profiles:
    jobid: 16
    output: resources/renewable_profiles/profile_offwind-dc.nc
    log: logs/build_renewable_profile_offwind-dc.log (check log file(s) for error message)

ERROR:snakemake.logging:Error in rule build_renewable_profiles:
    jobid: 16
    output: resources/renewable_profiles/profile_offwind-dc.nc
    log: logs/build_renewable_profile_offwind-dc.log (check log file(s) for error message)

RuleException:
CalledProcessError in line 355 of ~/pypsa-earth/Snakefile:
Command 'set -euo pipefail;  /Users/ekaterina/opt/miniconda3/envs/pypsa-earth-pdfix/bin/python3.10 ~/pypsa-earth/.snakemake/scripts/tmp5lhrw197.build_renewable_profiles.py' returned non-zero exit status 1.
  File "~/pypsa-earth/Snakefile", line 355, in __rule_build_renewable_profiles
  File "/Users/ekaterina/opt/miniconda3/envs/pypsa-earth-pdfix/lib/python3.10/concurrent/futures/thread.py", line 58, in run
ERROR:snakemake.logging:RuleException:
CalledProcessError in line 355 of ~/pypsa-earth/Snakefile:
Command 'set -euo pipefail;  /Users/ekaterina/opt/miniconda3/envs/pypsa-earth-pdfix/bin/python3.10 ~/pypsa-earth/.snakemake/scripts/tmp5lhrw197.build_renewable_profiles.py' returned non-zero exit status 1.
  File "~/pypsa-earth/Snakefile", line 355, in __rule_build_renewable_profiles
  File "/Users/ekaterina/opt/miniconda3/envs/pypsa-earth-pdfix/lib/python3.10/concurrent/futures/thread.py", line 58, in run
Shutting down, this might take some time.
WARNING:snakemake.logging:Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
ERROR:snakemake.logging:Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2022-12-02T111517.715266.snakemake.log
WARNING:snakemake.logging:Complete log: .snakemake/log/2022-12-02T111517.715266.snakemake.log
ekatef commented 1 year ago

It may also make sense to add a check to ensure that all the inputs were downloaded properly. It looks like sometimes the loaded files may be incomplete which causes some troubles along the workflow.

E.g. I had a rasterio error when trying to open ind_ppp_2020_UNadj_constrained.tif via build_shapes:

INFO:snakemake.logging:[Sun Dec  4 01:16:13 2022]
rule build_shapes:
    input: data/eez/eez_v11.gpkg
    output: resources/shapes/country_shapes.geojson, resources/shapes/offshore_shapes.geojson, resources/shapes/africa_shape.geojson, resources/shapes/gadm_shapes.geojson
    log: logs/build_shapes.log
    jobid: 12
    reason: Missing output files: resources/shapes/africa_shape.geojson, resources/shapes/country_shapes.geojson, resources/shapes/gadm_shapes.geojson, resources/shapes/offshore_shapes.geojson
    resources: tmpdir=/var/folders/qn/vpndfm21795ckkq89np1ckp40000gn/T, mem_mb=500
INFO:snakemake.logging:rule build_shapes:
    input: data/eez/eez_v11.gpkg
    output: resources/shapes/country_shapes.geojson, resources/shapes/offshore_shapes.geojson, resources/shapes/africa_shape.geojson, resources/shapes/gadm_shapes.geojson
    log: logs/build_shapes.log
    jobid: 12
    reason: Missing output files: resources/shapes/africa_shape.geojson, resources/shapes/country_shapes.geojson, resources/shapes/gadm_shapes.geojson, resources/shapes/offshore_shapes.geojson
    resources: tmpdir=/var/folders/qn/vpndfm21795ckkq89np1ckp40000gn/T, mem_mb=500

INFO:snakemake.logging:
This is the repository path:  ~/pypsa-earth
Had to go 0 folder(s) up.
INFO:__main__:Stage 1 of 4: Create country shapes
WARNING:country_converter.country_converter:Z01 not found in ISO3
WARNING:country_converter.country_converter:Z04 not found in ISO3
WARNING:country_converter.country_converter:Z05 not found in ISO3
WARNING:country_converter.country_converter:Z07 not found in ISO3
WARNING:country_converter.country_converter:Z09 not found in ISO3
INFO:__main__:Stage 2 of 4: Create offshore shapes
INFO:shapely.geos:Hole lies outside shell at or near point 88.285739905000071 24.888173355999982
/Users/ekaterina/opt/miniconda3/envs/pypsa-earth-pdfix/lib/python3.10/site-packages/geopandas/base.py:31: UserWarning: The indices of the two GeoSeries are different.
  warn("The indices of the two GeoSeries are different.")
INFO:__main__:Stage 4/4: Creation GADM GeoDataFrame
WARNING:country_converter.country_converter:Z07 not found in ISO3
WARNING:country_converter.country_converter:Z04 not found in ISO3
WARNING:country_converter.country_converter:Z09 not found in ISO3
WARNING:country_converter.country_converter:Z01 not found in ISO3
WARNING:country_converter.country_converter:Z05 not found in ISO3
WARNING:country_converter.country_converter:Z09 not found in ISO3
INFO:__main__:Stage 4/4 POP: Add population data to GADM GeoDataFrame
This is the repository path:  ~/pypsa-earth
Had to go 0 folder(s) up.
This is the repository path:  ~/pypsa-earth
Had to go 0 folder(s) up.
This is the repository path:  ~/pypsa-earth
Had to go 0 folder(s) up.
This is the repository path:  ~/pypsa-earth
Had to go 0 folder(s) up.
This is the repository path:  ~/pypsa-earth
Had to go 0 folder(s) up.
Compute population :   0%|                                | 0/1 [00:00<?, ? countries/s]
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "rasterio/_base.pyx", line 302, in rasterio._base.DatasetBase.__init__
  File "rasterio/_base.pyx", line 213, in rasterio._base.open_dataset
  File "rasterio/_err.pyx", line 217, in rasterio._err.exc_wrap_pointer
rasterio._err.CPLE_AppDefinedError: ~/pypsa-earth/data/WorldPop/ind_ppp_2020_UNadj_constrained.tif: TIFFReadDirectory:Failed to read directory at offset 488816214

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/ekaterina/opt/miniconda3/envs/pypsa-earth-pdfix/lib/python3.10/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "~/pypsa-earth/scripts/build_shapes.py", line 642, in _process_func_pop
    with rasterio.open(WorldPop_inputfile) as src:
  File "/Users/ekaterina/opt/miniconda3/envs/pypsa-earth-pdfix/lib/python3.10/site-packages/rasterio/env.py", line 444, in wrapper
    return f(*args, **kwds)
  File "/Users/ekaterina/opt/miniconda3/envs/pypsa-earth-pdfix/lib/python3.10/site-packages/rasterio/__init__.py", line 304, in open
    dataset = DatasetReader(path, driver=driver, sharing=sharing, **kwargs)
  File "rasterio/_base.pyx", line 304, in rasterio._base.DatasetBase.__init__
rasterio.errors.RasterioIOError: ~/pypsa-earth/data/WorldPop/ind_ppp_2020_UNadj_constrained.tif: TIFFReadDirectory:Failed to read directory at offset 488816214
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "~/pypsa-earth/.snakemake/scripts/tmpzktao528.build_shapes.py", line 832, in <module>
    gadm_shapes = gadm(
  File "~/pypsa-earth/.snakemake/scripts/tmpzktao528.build_shapes.py", line 767, in gadm
    add_population_data(
  File "~/pypsa-earth/.snakemake/scripts/tmpzktao528.build_shapes.py", line 726, in add_population_data
    _ = list(
  File "/Users/ekaterina/opt/miniconda3/envs/pypsa-earth-pdfix/lib/python3.10/site-packages/tqdm/std.py", line 1195, in __iter__
    for obj in iterable:
  File "/Users/ekaterina/opt/miniconda3/envs/pypsa-earth-pdfix/lib/python3.10/multiprocessing/pool.py", line 873, in next
    raise value
  File "/Users/ekaterina/opt/miniconda3/envs/pypsa-earth-pdfix/lib/python3.10/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "~/pypsa-earth/scripts/build_shapes.py", line 642, in _process_func_pop
    with rasterio.open(WorldPop_inputfile) as src:
  File "/Users/ekaterina/opt/miniconda3/envs/pypsa-earth-pdfix/lib/python3.10/site-packages/rasterio/env.py", line 444, in wrapper
    return f(*args, **kwds)
  File "/Users/ekaterina/opt/miniconda3/envs/pypsa-earth-pdfix/lib/python3.10/site-packages/rasterio/__init__.py", line 304, in open
    dataset = DatasetReader(path, driver=driver, sharing=sharing, **kwargs)
  File "rasterio/_base.pyx", line 304, in rasterio._base.DatasetBase.__init__
rasterio.errors.RasterioIOError: ~/pypsa-earth/data/WorldPop/ind_ppp_2020_UNadj_constrained.tif: TIFFReadDirectory:Failed to read directory at offset 488816214
[Sun Dec  4 01:16:29 2022]
INFO:snakemake.logging:[Sun Dec  4 01:16:29 2022]
Error in rule build_shapes:
    jobid: 12
    output: resources/shapes/country_shapes.geojson, resources/shapes/offshore_shapes.geojson, resources/shapes/africa_shape.geojson, resources/shapes/gadm_shapes.geojson
    log: logs/build_shapes.log (check log file(s) for error message)

ERROR:snakemake.logging:Error in rule build_shapes:
    jobid: 12
    output: resources/shapes/country_shapes.geojson, resources/shapes/offshore_shapes.geojson, resources/shapes/africa_shape.geojson, resources/shapes/gadm_shapes.geojson
    log: logs/build_shapes.log (check log file(s) for error message)

RuleException:
CalledProcessError in line 223 of ~/pypsa-earth/Snakefile:
Command 'set -euo pipefail;  /Users/ekaterina/opt/miniconda3/envs/pypsa-earth-pdfix/bin/python3.10 ~/pypsa-earth/.snakemake/scripts/tmpzktao528.build_shapes.py' returned non-zero exit status 1.
  File "~/pypsa-earth/Snakefile", line 223, in __rule_build_shapes
  File "/Users/ekaterina/opt/miniconda3/envs/pypsa-earth-pdfix/lib/python3.10/concurrent/futures/thread.py", line 58, in run
ERROR:snakemake.logging:RuleException:
CalledProcessError in line 223 of ~/pypsa-earth/Snakefile:
Command 'set -euo pipefail;  /Users/ekaterina/opt/miniconda3/envs/pypsa-earth-pdfix/bin/python3.10 ~/pypsa-earth/.snakemake/scripts/tmpzktao528.build_shapes.py' returned non-zero exit status 1.
  File "~/pypsa-earth/Snakefile", line 223, in __rule_build_shapes
  File "/Users/ekaterina/opt/miniconda3/envs/pypsa-earth-pdfix/lib/python3.10/concurrent/futures/thread.py", line 58, in run
Removing output files of failed job build_shapes since they might be corrupted:
resources/shapes/country_shapes.geojson, resources/shapes/offshore_shapes.geojson, resources/shapes/africa_shape.geojson
WARNING:snakemake.logging:Removing output files of failed job build_shapes since they might be corrupted:
resources/shapes/country_shapes.geojson, resources/shapes/offshore_shapes.geojson, resources/shapes/africa_shape.geojson
Shutting down, this might take some time.
WARNING:snakemake.logging:Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
ERROR:snakemake.logging:Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2022-12-04T011611.288280.snakemake.log
WARNING:snakemake.logging:Complete log: .snakemake/log/2022-12-04T011611.288280.snakemake.log

It has been resolved with simple replacement an existing version of ind_ppp_2020_UNadj_constrained.tif