Closed GbotemiB closed 8 months ago
Hi @GbotemiB , Can you update the notebook to start from 0/1 execution. (Restart kernel)
Noting the code:
entsoe_twkm = sum(entsoe.lines['s_nom'] * entsoe.lines['length'] * entsoe.lines['num_parallel'])
osm_220 = osm.lines[(osm.lines.v_nom > 220.0) & (osm.lines.dc == False)]
Can you remove in sum(entsoe.lines['s_nom'] * entsoe.lines['length'] * entsoe.lines['num_parallel'])
the entsoe.lines['s_nom']
as explained in the discord communication? Please also change the same calculation mistake for the OSM bit. We want only km
as unit/ not TWkm
. Be aware of the unit fixes that are potentially required.
Hello @pz-max Can you help me clarify what you meant by updating the notebook to start from 0/1 execution.
Sure. For notebooks you can run each cell multiple times (on the left side you see at which cell execution your count is).
If you restart the kernel and delete all outputs and then rerun the full notebook it should be fine and the count should go in the right order.
I have made the changes.
I noticed that the first graph changed. You can look into it
Hello @GbotemiB and thanks for the PR.
As a comment, it would be nice to keep the same snake-case naming style, to be consistent with the other notebooks.
Regarding the essence of the changes, now the diagram looks much closer to the reference data. That is great improvement! Could you please investigate the effect of the introduced changes on the other plots?
I mean, if you have understood that it was a mistake in the initial version you provided, that is usually a good idea to analyse its' nature and reasons, check which effects it had on the results and introduce all the corrections needed. Errors happen, and that is essential to know how to fix and prevent them ( saying from the experience 😄).
Hi @ekatef,
Taking into consideration the comment you provided, the recent changes made to the notebook only affected the comparison that had to do with transmission capacity where s_nom * length of each line * the number of parallel conductors
was introduced. But you can check through to see if something was missed.
I have also adapted the snake-case naming style.
I have made the changes.
I noticed that the first graph changed. You can look into it
The title says "km_per_length" but you mean maybe "length_in_km"? @GbotemiB
Hey @GbotemiB! Thanks for making the changes in the name.
Copying here @pz-max comments from our discussion:
- Your units are sometimes wrong (e.g. there is a 10^6 devision that is not correct)
- The graph titles/lables needs to be fixed
- The log image is not really necessary (suggest to remove it)
- Comparing the attached image to the image before suggest that something goes wrong.. Needs fixes.
It would be great if you could please address all the points. Regarding the log images, they can make sense if you want to zoom into some details (like difference between the countries). But for log distribution by voltage seems to be kind of redundant.
I have looked in you code and have an impression that it could make sense to revise it a bit to improve readability. In particular, you seem to be using a plenty of variables to do the same analysis, which make it difficult to trace all the calculations. That is the reason why an effect of the corrections you done for the first plot doesn't propagate to the whole code. It would be also great to decrease width of your code: PEP8 specifies the maximum line length at 79 symbols.
That is very needed work which requires quite careful treatment. Looking forward for the updated results :)
Thanks Katia. I will refactor the code to effect the changes. I will also work on the plots to make it interactive, this way the difference between the countries will be observable.
On Thu, Sep 21, 2023, 10:10 Ekaterina @.***> wrote:
Hey @GbotemiB https://github.com/GbotemiB! Thanks for making the changes in the name.
Copying here @pz-max https://github.com/pz-max comments from our discussion:
- Your units are sometimes wrong (e.g. there is a 10^6 devision that is not correct)
- The graph titles/lables needs to be fixed
- The log image is not really necessary (suggest to remove it)
- Comparing the attached image to the image before suggest that something goes wrong.. Needs fixes.
It would be great if you could please address all the points. Regarding the log images, they can make sense if you want to zoom into some details (like difference between the countries). But for log distribution by voltage seems to be kind of redundant.
I have looked in you code and have an impression that it could make sense to revise it a bit to improve readability. In particular, you seem to be using a plenty of variables to do the same analysis, which make it difficult to trace all the calculations. That is the reason why an effect of the corrections you done for the first plot doesn't propagate to the whole code. It would be also great to decrease width of your code: PEP8 https://peps.python.org/pep-0008/#maximum-line-length specifies the maximum line length at 79 symbols.
That is very needed work which requires quite careful treatment. Looking forward for the updated results :)
— Reply to this email directly, view it on GitHub https://github.com/pypsa-meets-earth/documentation/pull/64#issuecomment-1729168340, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALUUPPA4UYITIVAQ3R3J2DDX3QAABANCNFSM6AAAAAA45TC23E . You are receiving this because you were mentioned.Message ID: @.***>
Thanks Katia. I will refactor the code to effect the changes. I will also work on the plots to make it interactive, this way the difference between the countries will be observable.
Super! Looking forward for the revisions 🙂
Regarding the plots: feel free to experiment with interactivity, but I'd keep static versions too, as they are very convenient to quickly get a feeling on what is going on.
Hello @GbotemiB! Thanks a lot for making the changes. The code looks much cleaner now. Great work!
As a methodological suggestion, I'd also decrease a number of variables. There is the tidy data concept which actually can make life easier.
In short, that means keeping data in dataframes of standartised structure: variables are in columns, observations are in rows, and values are in cells
. So, instead of creating entsoe_length_in_km
in the first block, you can add an additional column in entsoe_hv
dataframe to keep equivalent length of the lines, and re-use this column further. That would eliminate code duplication and allows to downstream changes in your code easily.
Not sure however, if these changes would have significant effect on the result, as it feels like we have a rather grave error somewhere along the workflow which should be relatively easy to catch :)
@GbotemiB I'd suggest to read and think about data processing in meantime, while now to look into the alternative approaches.
Copying your great summary of the discussion in Discord channel:
AT
and MK
data;My feeling is that the best next step would be to do comparison for raw and clean OSM data. Reference PyPSA-Eur data are available here.
You may use country_shapes.geojson
shape files for AT
and MK
available in shapes
subfolder for each of the countries to filter-out PyPSA-Eur ENTSO data for each of them. Note please that voltage is in V
for PyPSA-Earth and in kV
for PyPSA-Eur.
Thanks for the feedbacks. I will work on the code and get back to you.
Hey @GbotemiB! 🙂 Thanks for making the adjustments.
Just to be sure: have you used
elec.nc
orbase.nc
to make comparison? Don't think it's very important in this case, just for information. Generally, I'd rather keep the original file names using folders to organise them.I think, we need the final iteration. The code and the plots are still looking great, while explanations could be probably elaborated a bit. (Probably, the hardest part of documentation writing...)
Btw, please feel free to resolve approving review comments, but please do not resolve review comments, if they contain any points to be addressed. Otherwise, we risk to miss some under-addressed points 😄
Have added some comments to the introduction and the conclusion. Happy to elaborate, if something is not clear.
Regarding the comparison, I used elec.nc
for both pypsa-earth and pypsa-eur.
@GbotemiB great contribution with AT-MK notebook!
I can't put the review comments into the code as GitHub doesn't allow to attach in-code comments due to a large diff. It would be great to address the following points.
entsoe_ref = gpd.GeoDataFrame(entsoe_ref_df, geometry="geometry", crs="EPSG:4326")
, I'm not quite sure that ENTSO geometry is using EPSG:4326
CRS. That is the global CRS, while PyPSA-Europe is mostly using EPSG:3035
focused on Europe. Could you please double-check? high_voltage_df()
seems to apply pretty the same procedure for "AT" and "MK". It would be great to remove this duplication using a country name as a parameter.[8]
("EPSG:4326"
) seems to be not needed.[9]
(starting from entsoe_ref['voltage'].replace(400000, 380000, inplace=True)
). Generally, the original data should always be kept to avoid distortion during analysis. In our particular case, I think it would be enough to adjust voltage threshold when filtering in high_voltage_df ()
, instead of modifying the voltages in the data.Section 2 Applying Simplification
)Section 2 Applying Simplification
section needs some explanations. My feeling is that our Discord discussion can provide you some hints on that :)Conclusion
, in general. However, some reasoning should be added to explain how does the summary follows from the analysis results.Do not be discouraged by the number of comments ;) You have done amazing work, and now we need to present its' results as clear as we can.
@GbotemiB great contribution with AT-MK notebook!
I can't put the review comments into the code as GitHub doesn't allow to attach in-code comments due to a large diff. It would be great to address the following points.
Code comments
- Looking to the line
entsoe_ref = gpd.GeoDataFrame(entsoe_ref_df, geometry="geometry", crs="EPSG:4326")
, I'm not quite sure that ENTSO geometry is usingEPSG:4326
CRS. That is the global CRS, while PyPSA-Europe is mostly usingEPSG:3035
focused on Europe. Could you please double-check?- Function
high_voltage_df()
seems to apply pretty the same procedure for "AT" and "MK". It would be great to remove this duplication using a country name as a parameter.- A code block
[8]
("EPSG:4326"
) seems to be not needed.- Not sure it's a good idea to replace voltage values in block
[9]
(starting fromentsoe_ref['voltage'].replace(400000, 380000, inplace=True)
). Generally, the original data should always be kept to avoid distortion during analysis. In our particular case, I think it would be enough to adjust voltage threshold when filtering inhigh_voltage_df ()
, instead of modifying the voltages in the data.- Methods chaining can be used to make implementation of simplification more compact (section
Section 2 Applying Simplification
)Presentation comments
- The title can be probably shortened and made more clear. In particular, I'd replace countries names with full names and remove mentions of "raw" and "clean" which may be a bit too much of details for the title. I think something like "OSM vs ENTSO validation: comparison for Austria and Macedonia". Obviously, feel free to improve.
- Introduction should improve objective of the study you done, as well as short presentation of the used data sources.
Section 2 Applying Simplification
section needs some explanations. My feeling is that our Discord discussion can provide you some hints on that :)- Agree with the
Conclusion
, in general. However, some reasoning should be added to explain how does the summary follows from the analysis results.Do not be discouraged by the number of comments ;) You have done amazing work, and now we need to present its' results as clear as we can.
Thanks @ekatef for the amazing comments. Please can you elaborate more on option 5 for code comments.
Also, concerning option 4 in code comments, the reason for replacing the values is because we are assuming that 400kV is 380kV. high_voltage_df()
is used to filter for only voltages greater 220kV.
Hello @davide-f, Just to confirm. What is the formula for line capacity?
Hello @davide-f, Just to confirm. What is the formula for line capacity?
Hey @GbotemiB! I'd use this definition from pypsa-earth code ;)
Hello @GbotemiB :)
Regarding this PR, the description of cells 6-11 of mt at comparison may be slightly improved with some comments or little markdown cells, but they are very good! Congratulations!
Regarding the power calculations, I confirm the formula by Katia that is the official one used in pypsa-earth (and eur). On electrical basis, in a three-phase system, the power is given as s_nom = sqrt(3) v_nom i_nom, where v_nom is the concatenated voltage and i_nom is the nominal phase voltage.
An interesting comparison regarding TWkm would be the following:
get_i_nom_by_voltage
that takes a voltage value and returns the corresponding current. Inside the function, you may have an ordered dictionary where the keys are the voltage levels (e.g. 110, 220, 300) and as values you have the pypsa standard types, e.g. "Al/St 240/40 2-bundle 220.0". then, by looping on the keys in an ordered manner, you may return the i_nom of the first standard type whose voltage key exceeds the v_nom of the line.In this PR, the first approach may be interesting, while the second one is more for understanding if the 4x factor may be due to that problem. What are your thoughts?
P.S. in the base_network file, it may be interesting to highlight the value of df.voltage.value_counts() to check with voltage values are used and their frequency
@davide-f thanks a lot for the brilliant review and in-depth comments :)
@GbotemiB I recognise that work on AT-MK notebook is ongoing, but I can still see great progress. Nice structuring of the intro part! My feeling is that it looks much more clear now. So, absolutely agree with Davide that you have done an amazing work.
Regarding calculations of i_nom
:
i_nom
values you can find in PyPSA line types specification.Just a note to myself. This is what I will use to review the notebook and create a PR on top if required:
git clone https://github.com/pz-max/documentation.git # my fork
cd documentation
git remote add GbotemiB https://github.com/GbotemiB/documentation
git fetch documentation
git checkout -b validation GbotemiB/main
Hi everybody, great analysis @GbotemiB I think the results are close to being conclusive.
crs
to make the lengths ideally match (use the crs of googlemaps & also use a crs that is the physical most correct one)Just looking at the AT MK notebook again, it may be interesting to add the resources/base_network/..._lines file into the comparison alongside the raw and clean dataset. In the build_osm_network some magic happens and some lines are duplicated, some are dropped and so on. Some duplication may occur here as well.
@GbotemiB sorry, have missed to answer your question on chaining in Section 2. You can get a feeling on if looking into methods chaining in pandas cheatseet, and get more details when searching by "pandas mathods chaining".
Regarding transformation of 400kV
to 380kV
, agree with you reasoning. Actually, that is quite closely to the analysis Davide suggests to do (which is a very insightful way to look into the data). However, I think that such transformation should be applied only with a very good reason, while splitting voltage by intervals would be enough for AT-MK analysis.
@GbotemiB sorry, have missed to answer your question on chaining in Section 2. You can get a feeling on if looking into methods chaining in pandas cheatseet, and get more details when searching by "pandas mathods chaining".
Regarding transformation of
400kV
to380kV
, agree with you reasoning. Actually, that is quite closely to the analysis Davide suggests to do (which is a very insightful way to look into the data). However, I think that such transformation should be applied only with a very good reason, while splitting voltage by intervals would be enough for AT-MK analysis.
I think creating a function to put the voltages we need into categories should solve the issue of replacing values.
@pz-max @ekatef
Here is the notebook that was used for the comparison between osm-data and pypsa-eur data. Please kindly review the notebook.