Closed GbotemiB closed 8 months ago
Hello @GbotemiB! Great improvements! It looks like we are converging :)
Adding a couple of comments aimed to make the code a bit easier to read.
I wonder if if_country_in_entsoe( )
checking function may be a bit of overshoot. We know in advance that the requested countries is in ENTSOE data. Agree that it has been a valuable finding with missed countries in ENSTOE data, but this notebook has quite a narrow purpose.
I have an impression that geospatial_plot( )
can be made probably a bit clearer if
# Filter for country code
block and avoid using the global variables (at_country_shape
etc.) inside the function. I see two possible solutions:country code
would be transferred as an argument into geospatial_plot( )
and loading data would be done inside geospatial_plot( )
itself.What is your feeling about that?
_apply_simplication( )
with an underscore? Do you mean that the function is intended for internal use?Regarding the plots, I have the following comments.
The plots look amazing. A really great work!
In Geospatial Visualization
, it looks like raw
and clean
geometries are mostly completely overlaid by base
data, right? It that is the case, I think we may state it it the introduction to the section. Something like: "In this section we compare topology of raw, clean and base datasets for transmission lines. Script clean_osm_data
removes lines outside of the considered region, so only "raw" lines present outside each of the considered countries Inside a country, all three PyPSA-Earth datasets are mostly fully overlapped." Feel free please to improve this draft.
I think, it would be helpful for a reader to add a couple of lines to explain a purpose of Simplified vs Non Simplified
and some kind of a conclusion. For example: "ENTSOE-extracted data have a simplified topology, while OSM-extracted lines reproduce topology of a transmission network in quite a detail. To estimate an effect of this simplification, we have applied a simplification procedure to OSM line geometries, as well. Douglas-Peucker simplification algorithm has been applied for that."
- I wonder if if_country_in_entsoe( ) checking function may be a bit of overshoot. We know in advance that the requested countries is in ENTSOE data. Agree that it has been a valuable finding with missed countries in ENSTOE data, but this notebook has quite a narrow purpose.
This function descibed here is to make it easy to filter through the entsoe for just AT and MK using the geometry cordinates.
Has it been a specific reason to start a name of _apply_simplication( ) with an underscore? Do you mean that the function is intended for internal use?
Actually, no reason in particular. i will rename the variable name 😀.
I have an impression that geospatial_plot( ) can be made probably a bit clearer if
we would remove # Filter for country code block and avoid using the global variables (at_country_shape etc.) inside the function. I see two possible solutions: country code would be transferred as an argument into geospatial_plot( ) and loading data would be done inside geospatial_plot( ) itself. we would transfer as arguments a country shape and both osm dataframes
Sounds great. I shouldnt be having global variables in my functions.
@GbotemiB amazing work and very nice presentation. I think, we need a final clean-up round, and that task will be completed.
Find please the comments bellow.
My feeling is that a block with if_country_in_entsoe()
can be simplified:
entsoe_ref["if_at"] = if_country_in_entsoe(at_country_shape, entsoe_ref)
is not nessesarily more clear as compared with base_data["geometry"].apply(lambda row: row.within(country_df["geometry"][0]))
# Determine if Austria (AT) is in the ENTSO-E data.
entsoe_ref["if_at"] = if_country_in_entsoe(at_country_shape, entsoe_ref)
As you have explained, that is filtering rather than actually definition if the country present in the data
geo_df.loc[geo_df.within(country_df["geometry"][0])]
works?I'd avoid duplicating comments, like currently is the case in this part:
# Clean the Austria (AT) OSM raw data, addressing non-uniform column names and NaN values.
at_osm_raw_lines = preprocess_raw_data(at_osm_raw_lines)
mk_osm_raw_lines = preprocess_raw_data(mk_osm_raw_lines)
My feeling is that it would be more clear to keep both code lines together and have only one comment for both of the lines
3. Note please that the comments in the code should explain **why** or **what for** is the following part needed, not **what** is it doing. On the contrary, names of the functions or methods should indicate what actually they are doing.
In particular, I think that this comment should be revised: `# Applying simplification function to preprocess the dataframe with __apply_simplification func`
4. A suggestions on the naming improvement (that is a really hard part!):
- it would be great to rename `converted_length` into something like `reprojected_length` or `length_3035`;
- it would be greate to make `create_plot` name more specific.
5. A section `Comparison with Barchart` misses explanations.
6. Would be great to specify also a unit for `tolerance` in the dedicated comment `# Define tolerance level for simplification`: is it in meters?
7. Aren't the bar diagrams duplicated in `Comparison with Barchart` and `Section 2 Applying Simplification`?
8. Great that you have added explanations to the Conclusion. To make our statement more credible, I'd probably add a bit more details: instead to declare that OSM data are more accurate, we may explain that OSM data are more detailed and more accurately represent the actual grid topology.
- Aren't the bar diagrams duplicated in Comparison with Barchart and Section 2 Applying Simplification?
The plots are not the same, they look similar, which makes sense because of the effect of simiplication.
if you take a look at the Simplified vs Non-Similified Section
, you will find that the difference is not the large.
This is the difference in kilometers
@ekatef Thank you for the wonderful comments and review. I have pushed my recent commits effecting the changes you suggested. I will be glad to have you review the notebook with the recent commits.
Hi @ekatef, I have refined the conclusion in the notebook. We can start iterating over the notebook from here making it understandable for a larger audience. You can check my recent commits.
Hey @GbotemiB and thanks for the nice conclusion draft!
A couple of comments.
I'd suggest you to write the next iteration of the conclusion, giving it some structure. Making some preliminary plan may be probably helpful for that. Basing on the notebook content and the draft you have created, I'd suggest the following plan.
simplify
method (which utilises Douglas-Peucker algorithm) with a quite satisfactory results: simplified OSM topologies get in fact closer to ENTSO data.Then the main conclusion point would look well-reasoned ;)
Btw, with a fresh look it feels like clarity of the notebook may be slightly improved:
1) under # Create plot using the simplified dataframe
in [26]
cell, that is not clear if the plot shows values for the original or the simplified geometry; in both cases there may be duplication there: do we really need it?;
2) in cells [28]
and [29]
, not sure if it's evident which OSM-deriven values have been taken for comparison. I assume there have been simplified ones, but that does not follow directly from the table itself. Would be great to double-check it and add some explanations to the table.
3) does it probably make sense to add such a percentage difference for non-simplified values, as well?
4) as a side note: please keep in mind that it's better show only meaningful digits when presenting numbers: in the tables, in particular. Set-up of the proper number formatting can be helpful for that.
Hi @ekatef, thanks for the review. Based on the recent comments, I have revamped the conclusion based on the suggestions.
Here is the new conclusion.
This analysis aimed to investigate OSM data against ENTSOE data using Austria and North Macedonia. The following process was carried out during the analysis:
The analysis showed an estimated percentage difference of 11% between osm-data and ENTSOE-data. After applying the Douglas-peucker algorithm, an estimated percentage difference of about 5% was observed in the analysis.
Hello @GbotemiB! Thank you for working on that. My feeling is that we are converging.
Comments on the overall structure of the notebook:
x
and the ENTSOE length x_ensto
: percentage_difference = (x - x_enstso)/x_entso
. But I may be wrong.Regarding the Conclusion: now it reads much smoother! A couple of comments:
In summary
and In conclusion
) in the conclusion. I feel that they may do the text "heavier". But that is very subjective 🙂 I think we are very close to finalise. Great work!
Thanks for the review. Based on the comments.
Hi @ekatef, I have revised the notebook based on your comments. Ready for another revision 🙃
Hello @GbotemiB! Perfect!! :D
Great that you have found a neat way to fix numeric precision issue. And I very much like the result of you work.
The only comment left is an answer to your question regarding the percentage change: I think it should be rather a relative change as compared to ENTSO reference data, as we assume them to be standard de facto. Would be also perfect to add a couple of the relative changes values to the conclusion, like: closely align with ENTSO data: being about x1..x2% for Austria and y1..y2% for Northern Macedonia
(where x1, x2 correspond to the network which has been made as close as possible to ENSTOE data).
After that, I think the notebook is basically ready to merge.
@pz-max do probably you have any comments or recommendations? 🙂
Thank you @ekatef for the review, I will make the final changes as you have suggested.
Thank you @ekatef for the review, I will make the final changes as you have suggested.
Great work! I think, the only point left is the very final polishing of the conclusion, and the PR is ready to merge.
Thanks for the amazing contribution! 😄
Merged 🎉 🎉 🎉 Thanks for the outstanding contribution, @GbotemiB! 🥇 Fantastic work which will be definitely very helpful in future.
Thank you for the amazing support.
This PR contains a validation notebook that explains the discrepancy found here.
Objective
The Objective of this notebook is to validate OSM data against ENTSOE data using resource files for Austria and North Macedonia.
Results
It was confirmed that the reason for the discrepancy was because the
elec.nc
file for OSM that was compared against that of PyPSA-Eur contains voltages that are lower than 220kV that was rebased to 220kV. It is advised that comparison should be done using base_network_csv files since there is no rebase of voltage in the csv files.PS: There is an ongoing conversation about this notebook here