This issue is meant to represent the major list of open issues across the dataworkflow, to present a summary and provide guidance.
This thread is meant to discuss data workflow structures and architecture, not solving the specific issues, whose discussion will be tackled by different independent issues.
Core-related issues by script
retrieve_databundle_light
Script to download data from google drive
[x] (optional) Rather than download all files in folders, a selective download may be created, to distinguish test cases, all africa case, etc.
[x] (optional) Add progress bar in the downloading process
build_shapes
Script to create country shapes and populate them with GDP and Population data
[x] Improve the modelling of GDP for the shapes, by improving the resolution of tiff images #56
[x] (optional) Add progress bar in the shape processing #56
[x] (optional) Parallel processing could be add to speed up the calculations #56
osm_pbf_power_data_extractor
Script to download OpenStreetMap data from openfabrik
[x] #132
osm_data_cleaning
Script to filter the raw data downloaded from openfabrik
[x] Output filenames to be picked from snakemake
[x] Verify how to tackle lines without endings (or accept the current formulation)
[x] Add line endings to substations list (activity currently done in base_network) #113
[ ] Verify definition of underground/under_construction/tag_frequency, which are currently initialized only
[x] Verify how to manage the substation datasets, in agreement to the future implementation of build_powerplants: #121 build_powerplants loads the raw file downloaded from OSM
[x] Fix None geometry shapes #131
osm_built_network
Script to create the network description
[x] Filter out generator lines: sometimes lines linking the generator to the closest substation are set. It may makes sense to remove that. However, this shall be of second priority: we have to see how simplify network will act on such lines and that may automatically takes the issue into consideration
[ ] Check how to properly tackle buses with/without low voltage
[ ] Implement option parameters in the config and the corresponding methodologies about how to create the network, based on the data
[x] Move the part that adds substations based on the line endings to osm_data_cleaning #121
[x] Need for merging buses corresponding to the same substation_id by voltage. The lines and buses datasets may lead to create buses whose latitude/longitude coordinates are very close but not identical. That creates problems that are now fixed in the base_network and shall be moved to the cleaning phase #121
[x] Improve the merging process of the buses taking into consideration the type of the node AC/DC
[x] Fix None geometry shapes #131
base_network
Sctipt to create the first PyPSA network
[ ] The dataset used in this script shall be cleaned before: potentially move the cleaning of NaN values etc. to previous scripts, e.g. osm_data_cleaning #131
[ ] Clarify how to manage the "symbol" data in buses (e.g. substation, converter station, etc.)
[ ] Manage under_construction lines
[x] Implement add converters
[x] Implement transformers
[x] Implement links
[ ] Add _remove_unconnected_components (when PyPSA africa is complete), if interested: removes small network structures. We may implement an option for that
[ ] The issue about close buses with the same substation_id is solved by creating fake lines that connect those buses; that should be moved and improved in data cleaning.
build_bus_regions
Script to create the bus regions based with voronoi technique, based on the network structure
[ ] Optionally specify different methodologies to possibly derive the shapes for the analysis. For example, countries with missing shapes (e.g. Somalia) may be filled with external (or GADM) shapes, or the Voronoi cells may be limited in size to avoid the creation of unrealistic large shapes in countries with few substations, and the missing areas can be filled with gadm shapes or equivalent.
build_cutout
Create cutouts with atlite
The script works file; it needs Copernicus registration, thus in the retrive data bundle we should have preprocessed data to let the user choose whether to go through this or not.
build_natura_raster
Converts vector data to rasters
The script works fine, not major comments to report
build_renewable_profiles
Script to produce hourly profiles of resources (pv, wind,...)
[x] crosscheck the hydro script that produces inflows for bus stations: need to verify the absolute values generated by the methodology; additional data may be provided to verify that. Moroever, need to generate the profiles for the hydro only for a selected number of stations, where that makes sense (currently all buses are considered)
[x] #145
build_powerplants
Script to integrate the power plants in the PyPSA model.
[x] Script to be adapted based on the PyPSA-Europe model #116
[x] Need for close collaboration and integration with powerplantmatching #116
[x] Finalize the script with the stable version of powerplantmatching, once the fork by @koen-vg is merged into powerplantmatching
add_electricity
Script to add the generation sources and load to the PyPSA model
[x] Modify the script to properly add solar and wind sources; currently an ad-hoc test is applied to add solar only (function attach_wind_and_solar)
[x] Validate the hydro model once powerplants are properly added
[x] Include powerplants: currently, the build_powerplants is missing, thus the corresponding functions are also not activated in the script #116
simplify_network
Script to simplify the modelling of the network: remove dead-ends, create the 380-kW equivalent model
[x] Add links
[ ] Need to verify and adjust the busmap
[ ] Remove unnecessary columns in the dataframes
cluster_network
Script to cluster the network and reduce the size of the problem
[x] Change bounds parameter to n_bounds
[x] Activate and debug determine_network_topology
[x] Check and validate busmap_by_louvain in busmap_for_country
[x] To properly implement #3, there is the need to define a small test case with at least two countries (best close): 1 with coast line and anotherone without. To properly create this test case, the raw input files shall be cut to limit the size so to keep the sample files in github and avoid the downloading of heavy files
Documentation
[x] #136
[ ] Need for better documenting all the code scripts, and in particular the main functions to use
This issue is meant to represent the major list of open issues across the dataworkflow, to present a summary and provide guidance. This thread is meant to discuss data workflow structures and architecture, not solving the specific issues, whose discussion will be tackled by different independent issues.
Core-related issues by script
retrieve_databundle_light
Script to download data from google drive
build_shapes
Script to create country shapes and populate them with GDP and Population data
osm_pbf_power_data_extractor
Script to download OpenStreetMap data from openfabrik
osm_data_cleaning
Script to filter the raw data downloaded from openfabrik
osm_built_network
Script to create the network description
base_network
Sctipt to create the first PyPSA network
build_bus_regions
Script to create the bus regions based with voronoi technique, based on the network structure
build_cutout
Create cutouts with atlite
The script works file; it needs Copernicus registration, thus in the retrive data bundle we should have preprocessed data to let the user choose whether to go through this or not.
build_natura_raster
Converts vector data to rasters
The script works fine, not major comments to report
build_renewable_profiles
Script to produce hourly profiles of resources (pv, wind,...)
build_powerplants
Script to integrate the power plants in the PyPSA model.
add_electricity
Script to add the generation sources and load to the PyPSA model
simplify_network
Script to simplify the modelling of the network: remove dead-ends, create the 380-kW equivalent model
cluster_network
Script to cluster the network and reduce the size of the problem
solve_network
Script to solve the network
add_extra_components
utilities
No comments
Non-code elements
Github
Documentation