pypsa-meets-earth / pypsa-earth

PyPSA-Earth: A flexible Python-based open optimisation model to study energy system futures around the world.
https://pypsa-earth.readthedocs.io/en/latest/
225 stars 176 forks source link

Missing pieces towards global PyPSA-Earth #445

Open davide-f opened 2 years ago

davide-f commented 2 years ago

Towards global PyPSA-Earth

In this issue, we track major requirements needed to successfully run the workflow at complete global scale. I have been running parts of the model using countries=["Earth"] and in the following I resume some findings; this list is to be populated by additional comments.

ekatef commented 1 year ago

@davide-f, thank you for explanations regarding performance during the discussion. Preliminary results of testing workflow for China (that's not profiling rather ux test):

1) build_shapes takes not so long -- about 12 hours but that feels not very comfortable especially during the first run as the script remains silent during all these 12 hours. (Maybe I'm missing some logs?) Could it probably make sense to attach some progress tracking at the computational cycle itself?

https://github.com/pypsa-meets-earth/pypsa-earth/blob/3f2c3c91017f200e45ac66c5c179801b1f113dbb/scripts/build_shapes.py#L728-L729

2) regarding parallelizing which you mentioned is not working yet for adding the population: could you please clarify a bit?

Is it imap not working there

https://github.com/pypsa-meets-earth/pypsa-earth/blob/3f2c3c91017f200e45ac66c5c179801b1f113dbb/scripts/build_shapes.py#L722-L726

or is it this piece which is not under imap but nevertheless slow and could be parallized as well?

https://github.com/pypsa-meets-earth/pypsa-earth/blob/3f2c3c91017f200e45ac66c5c179801b1f113dbb/scripts/build_shapes.py#L728-L729

3) in build_osm_network the limiting stage at the moment is even not (yet) set_substations_ids and set_lines_ids but fix_overpassing_lines. Currently only a half is processed for about 30 hours. [Probably, that's a good idea to switch this option off for the first quick run... :)] It feels not so bad as it's clear that something is goings on and it's possible to get an estimation for the ending time. However, probably could performance of that stage also be taken into consideration as well when working on the performance?

mnm-matin commented 1 year ago

I would like to create a PR on set_substations_ids and set_substations_ids, that is more efficient. But the only thing holding me back is a lack of input and output dataframes. If someone can provide an input dataframe and the expected output (for the given params) that would be very helful.

davide-f commented 1 year ago

Great @mnm-matin !

This task is very interesting and I'm very happy to support you. I've some ideas on how to do that and could be good to discuss on them. This task should also be quite easy to do. Shall we have a 30 minute chat about it?

I can provide input and output files for any country in the world. I'd recommend to start debugging with small countries and then test a large one.

A good large test case could be US or China, for a small one, maybe Nigeria should do the job. What do you think?

mnm-matin commented 1 year ago

Thanks @davide-f

That sounds great. Happy to have a meeting. The input and output files (perhaps over discord) would be awesome. For set_substations_ids(buses, distance_crs, tol=2000), input: buses dataframe output: buses dataframe with the added columns

I will keep the pr limited to just set_substations_ids, but the approach should work for line_ids as well.

Large or small countries would be nice for benchmarking. Mainly, I require the input and output files just to make sure I'm getting the right results.

davide-f commented 1 year ago

Here they are :) https://drive.google.com/drive/folders/1YJp7fIrlCIIac2Gm-2ie4w8aLPgNWZZq?usp=drive_link

davide-f commented 1 year ago

To track the needed improvements, this is the current time requirements in hours for using US:

rule key
download_osm_data total_time 0.102822 clean_osm_data total_time 3.603223 build_shapes total_time 4.601684 build_bus_regions total_time 0.324454 build_osm_network total_time 16.631785 build_demand_profiles total_time 0.059216 build_powerplants total_time 1.337166 build_renewable_profiles total_time 0.637599 base_network total_time 0.105632 add_electricity total_time 0.059819 simplify_network total_time 0.211443 cluster_network total_time 0.019749 solve_network total_time 0.110048 total_comp_stats total_time 30.608660 Name: US, dtype: float64

The PRs on build_osm_network by @mnm-matin can help tackle the major bottleneck. Current PR #650 by @GridGrapher can significantly help break down computational time for build_shapes The subsequent bottleneck is addressing clean_osm_network, in particular the function set_countryname_by_shape