openclimatefix / ocf_datapipes

OCF's DataPipe based dataloader for training and inference
MIT License
13 stars 11 forks source link

Add GSP effective capacity support #212

Closed dfulu closed 1 year ago

dfulu commented 1 year ago

Pull Request

Description

Made changes to include both the GSP installed capacity and estimated effective capacity.

The installed capacity is now under the batchkey BatchKey.gsp_installed_capacity_megawatt_power The effective capacity is under the batchkey BatchKey.gsp_effective_capacity_megawatt_power

I chose the above to be explicit about which is which.

Changes include:

Checklist:

dfulu commented 1 year ago

@jacobbieker I updated some of the comments and variable names in the metnet and pseudo irradiance pipelines as I think they were misnamed to be GSP instead of PV. Let me know if I got this wrong.

peterdudfield commented 1 year ago

should we stick with the names in PVLive? ie.

dfulu commented 1 year ago

We could do. Previously the key was BatchKey.gsp_capacity_megawatt_power so that would be a change away from that?

peterdudfield commented 1 year ago

We could do. Previously the key was BatchKey.gsp_capacity_megawatt_power so that would be a change away from that?

@JackKelly @jacobbieker what do you think?

jacobbieker commented 1 year ago

We could do. Previously the key was BatchKey.gsp_capacity_megawatt_power so that would be a change away from that?

@JackKelly @jacobbieker what do you think?

We use _mwp in other places, so could be good. I think we wrote them out to be super explicit what the units were, but I think its probably fine shortening them to that. I would still keep gsp though, I like having it explicit that its GSP capacity, not PV.

dfulu commented 1 year ago

IIRC in PVLIve they are installedcapacity_mwp, i.e. without any underscore separating "installed_ and capacity, and capacity_mwp. So maybe this option would be to rename to gsp_installedcapacity_mwp and gsp_capacity_mwp for the batchkeys?

peterdudfield commented 1 year ago

IIRC in PVLIve they are installedcapacity_mwp, i.e. without any underscore separating "installed_ and capacity, and capacity_mwp. So maybe this option would be to rename to gsp_installedcapacity_mwp and gsp_capacity_mwp for the batchkeys?

I like this

dfulu commented 1 year ago

@peterdudfield @jacobbieker @JackKelly I've done some variable name changes.

The installed batchkeys are gsp_[installed]capacity_mwp, and the variable names in the GSP xarray.Dataset are [installed]capacity_mwp.

This is out of keeping with the PV variable names which has batchkey pv_capacity_watt_power, but maybe that can be cleaner up later?

codecov[bot] commented 1 year ago

Codecov Report

Merging #212 (aeabf2d) into main (d208d0a) will increase coverage by 0.05%. The diff coverage is 76.47%.

:exclamation: Current head aeabf2d differs from pull request most recent head f9ec715. Consider uploading reports for the commit f9ec715 to get more accurate results

@@            Coverage Diff             @@
##             main     #212      +/-   ##
==========================================
+ Coverage   81.34%   81.40%   +0.05%     
==========================================
  Files         128      128              
  Lines        5549     5538      -11     
==========================================
- Hits         4514     4508       -6     
+ Misses       1035     1030       -5     
Impacted Files Coverage Δ
ocf_datapipes/convert/numpy/batch/gsp.py 100.00% <ø> (ø)
ocf_datapipes/load/gsp/gsp.py 86.11% <ø> (-0.38%) :arrow_down:
ocf_datapipes/load/gsp/utils.py 100.00% <ø> (ø)
ocf_datapipes/production/power_perceiver.py 30.43% <ø> (ø)
ocf_datapipes/training/gsp_national.py 100.00% <ø> (ø)
ocf_datapipes/training/gsp_pv_satellite_nwp.py 100.00% <ø> (ø)
ocf_datapipes/training/metnet_national.py 17.27% <0.00%> (ø)
ocf_datapipes/training/metnet_pv_national.py 18.86% <0.00%> (ø)
ocf_datapipes/training/metnet_pv_site.py 80.76% <40.00%> (ø)
ocf_datapipes/training/pseudo_irradience.py 75.00% <40.00%> (ø)
... and 6 more

... and 16 files with indirect coverage changes

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

JackKelly commented 1 year ago

If I were starting from scratch, I'd probably vote for gsp_nominal_capacity_megawatt_peak and gsp_effective_capacity_megawatt_peak. (Or shorten megawatt_peak to mwp)

My reasoning is that I think that just capacity is super ambiguous in the world of PV (is it nominal AC? Or nominal DC? Or the line capacity? Or max of the AC timeseries? Or something else?!)

I might even argue for including "PV" in the name somewhere (what happens if we also start recording the capacity for wind, EVs, diesel generators, etc.)?

And my personal preference would be to use underscores consistently throughout all our names (i.e. to separate words).

I definitely think we don't need both power and some synonym of megawatt in the name. The Watt is a unit of power. So power is redundant if watt is already in the name :slightly_smiling_face:

But I'm definitely aware that my tastes are probably far more pedantic and explicit than most peoples' tastes!

But, given that we're not starting from scratch, I agree that we should do whatever feels most natural given the existing naming conventions. And I don't feel sufficiently up-to-speed on the existing naming conventions to give an informed opinion!

dfulu commented 1 year ago

@peterdudfield @jacobbieker @peterdudfield

I mostly agree with Jack's comment. I've changed the Batchkeys to gsp_nominal_capacity_mwp and gsp_effective_capacity_mwp and inside the GSP dataset the variable names are now nominal_capacity_mwp and effective_capacity_mwp.

I think we should leave the pv part of the name until we come to that. It should be fairly easy to do a find-replace on these names.