Add notebook for make_statistics

javier-cp6 commented 1 year ago

Closes #46, closes pypsa-meets-earth/pypsa-earth#608 Added a notebook that explains how to create statistics on the workflow by running the make_statistics rule.

ekatef commented 1 year ago

Hi @javier-cp6, your contribution is absolutely welcome! It would help a lot to have a demo on make_stats.

Adding to @pz-max suggestions regarding supplementing the values in the table with units, I think it would be great to play a bit around output formats, in particular:

1) keep a number of digits reasonable: currently our stats claims to know transmission lines length with nanometers precision with clean_osm_data.3;

2) find a way to deal with big numbers like those for build_shapes.

What do you think?

javier-cp6 commented 1 year ago

Sure. I think I can also group the rules.

davide-f commented 1 year ago

Hello Javier :) Many thanks for the PR and sorry for the delay but Max and Katia have covered the comments. Personally, I really like the PR and I think little updates may be done to improve it. In particular:

note that the stats.csv is a dataframe that has a multi-index as columns. That is important and I am unsure that it is properly capture in the model. To read it you could use something like: read_csv_nafix(str(stats_path), header=[0, 1], index_col=0) using the read_csv_nafix from the helpers. Note that this is to prevent problems with namibia whose country code is "NA" and that may be interpreted as nan. If you don't want to import that function ok, but you shall use pd.read_csv(..., na_values=["NULL"])
Agree with Max and Katia that for special inputs (e.g. population/ area etc), it may be worth adding some cells and some description to improve the quality of the analysis
Other naming conventions, may be instead incorporated into make_statistics itself, e.g. if we prefer using substations-noinstead of substations-size [not sure where those comments have moved to]

javier-cp6 commented 1 year ago

Hi @pz-max @ekatef @davide-f ,

Thanks for your hints. So far, I’ve updated the notebook with the following changes:

Replaced read_csv() with the read_csv_nafix() method from the helpers module to read the stats.csv file as a multi-index dataframe and prevent problems in case of a scenario where Namibia is present.
Updated format of numbers (integers and floats) and keep transmission lines length with nine decimal places.
Added a new index column for units.

I need some help with the units for the stats. The rule does not generate them, so I'm looking in the make_statistics script and the resources folder generated for NG. However, I still need to complete the units for the remaining fields. Once I have completed this task, I will update the naming accordingly and add some cell descriptions.

ekatef commented 1 year ago

Hi @javier-cp6, a very nice work! Sorry for a delay with the answer: a problem of chained deadlines 🙂

My general feeling is the output looks much more clear now.

It seems lines_length for some reasons keeps nine decimal points. Could you please check it?

Regarding adding the units, I'd be happy to help. Agree, that in some cases in could be currently a bit tricky to dig-out the dimensions from the code and docs :) So, investigation on that and adding units to the outputs is definitely worth efforts. Could you please give some updates on which parameters do you need help to find their units?

javier-cp6 commented 1 year ago

Hi @ekatef , Thanks for your response. I've updated the notebook adding more units and descriptions. I've also fixed the lines_length format. I'll keep searching for the rest of units, could you please help me with add_electricty stats such as OCGT and CCGT?

ekatef commented 1 year ago

Hi @javier-cp6, super! Being formatted in this way, the notebook is very handy. My feeling is that your PR is close to be completed.

Values of CCGT:hydro under add_electricity imply installed capacity in MW. Probably, it could also make sense to explain that CCGT = combined cycle gas turbine, OCGT = open cycle gas turbine. What do you think?

It looks like also there is need to check units for lines_capacity along with potential and avg_production_pu. Have you managed to find units for them? I do have some ideas, but it's better to find a source to be sure :)

javier-cp6 commented 1 year ago

Hi @ekatef , Thanks! Almost done! Indeed, installed capacity of generators is in MW, I think it is not necessary to explain CCGT and OCGT. I've also added the units for lines_capacity (MVA), potential (MW), and avg_production_pu (MWh), and incorporated links to PyPSA documentation explaining those units.

I still need to complete the following. Please let me know, if you have any hints:

Units for load in solve_network.
The value for gdp (USD) in 'build_shapes' seems too big.
Some mean_load values (CPU usage percentage of the total running time) are grater than 100%.

ekatef commented 1 year ago

Hi @ekatef , Thanks! Almost done! Indeed, installed capacity of generators is in MW, I think it is not necessary to explain CCGT and OCGT. I've also added the units for lines_capacity (MVA), potential (MW), and avg_production_pu (MWh), and incorporated links to PyPSA documentation explaining those units.

I still need to complete the following. Please let me know, if you have any hints:

Hi @javier-cp6, nice to see the progress :) Great idea to incorporate the links into the documentation!

The remaining questions are quite advanced ones. There are some ideas:

Units for load in solve_network.

I suspect "load" in this context can really mean load shedding. Could you please check if it's the case in fact?

The value for gdp (USD) in 'build_shapes' seems too big.

According to the source article for GDP, the value correspond to "in constant 2011 international US dollars", while we are using 2020 to extract GDP. Inflation in Nigeria was about ~10% annually during 2011-2020 which seems to explain ~2.5 times difference we have.

Some mean_load values (CPU usage percentage of the total running time) are grater than 100%.

Is it probably a consequence of using a multi-core processor?

javier-cp6 commented 1 year ago

Hi @ekatef ,

Thanks for your explanations! I have finished updating the units and main descriptions for the stats in the notebook.

This time, I ran the run_all_scenarios rule, which automatically updates the configuration file for the Nigerias' test case and runs the make_statistics rule. I think the PR is ready for review.

ekatef commented 1 year ago

@javier-cp6 thanks for your amazing work, @davide-f thank you very much for the great review! Agree that we are close to finalise 🙂

Some comments after playing a bit with this PR:

A generated stats table is a perfect way to obtain a big picture of the modeling. Really looking forward to have this PR merged!
Davide's comments are crucial for usability: I needed some time to understand which folder is meant as root/pypsa-earth and failed to get run_all_scenarios work
Our discussion seems to be a good base to document both make_statistics and run_all_scenarios. An issue has been added to that

davide-f commented 1 year ago

@javier-cp6 As a suggestion, for setting the parent folder, please use this code:

# change current directory to parent folder
import os
import sys

if not os.path.isdir("pypsa-earth"):
    os.chdir("../..")
sys.path.append(os.getcwd()+"/pypsa-earth/scripts")

That is resilient with respect to the name of the parent directory. That is compliant to the parallel PR that I opened to fix that annoying problem.

javier-cp6 commented 1 year ago

Hi @ekatef , @davide-f ,

Thank you for your review and pointing out a potential confusion. I have updated the description so the make_satistics rule can be used anytime and explained the output format (results/{run_name}/stats.csv). Additionally, I have added a note briefly explaining the run_all_scenarios rule., so both rules should be able to run to make the stats.

The given snippet for setting the parent folder is very useful indeed. Please, let me know if there are any additional tasks that need attention.

davide-f commented 1 year ago

Nice @javier-cp6 ! I slightly revise a text and ready to merge :) Many thanks, you are officially a contributor! :)

pypsa-meets-earth / documentation

Add notebook for make_statistics #50