openclimatefix / nowcasting_dataset

Prepare batches of data for training machine learning solar electricity nowcasting data
https://nowcasting-dataset.readthedocs.io/en/stable/
MIT License
24 stars 6 forks source link

Enable config option for the MW threshold at which to drop GSP PV power #303

Open JackKelly opened 2 years ago

JackKelly commented 2 years ago

TBH, I'm increasingly thinking that we shouldn't drop any GSPs. More GSPs means more training examples :slightly_smiling_face: But it would be nice to be able to configure the threshold_mw in the config YAML.

jacobbieker commented 2 years ago

Yeah, I'd agree!

JackKelly commented 2 years ago

Just for context, the log seems to suggest that we're only using 4 GSPs at the moment?! (Or is that a typo?)

2021-11-16 15:11:54,289 DEBUG Loading Solar GSP Data from /mnt/storage_b/data/ocf/solar_pv_nowcasting/nowcasting_dataset_pipeline/PV/GSP/v2/pv_gsp.zarr from None to None at /home/jack/dev/ocf/nowcasting_dataset/nowcasting_dataset/data_sources/gsp/gsp_data_source.py#L409
2021-11-16 15:13:00,389 DEBUG Dropping 309 GSPs as maximum is not greater 20 MW at /home/jack/dev/ocf/nowcasting_dataset/nowcasting_dataset/data_sources/gsp/gsp_data_source.py#L383
2021-11-16 15:13:00,390 DEBUG Keeping 4 GSPs as maximum is greater 20 MW at /home/jack/dev/ocf/nowcasting_dataset/nowcasting_dataset/data_sources/gsp/gsp_data_source.py#L384
2021-11-16 15:13:00,597 DEBUG There are 313 GSP at /home/jack/dev/ocf/nowcasting_dataset/nowcasting_dataset/data_sources/gsp/gsp_data_source.py#L106
2021-11-16 15:13:00,597 DEBUG Creating nwp DataSource object. at /home/jack/dev/ocf/nowcasting_dataset/nowcasting_dataset/manager.py#L70
JackKelly commented 2 years ago

Although, the later code suggests we have 313 GSPs to play with (after thresholding at 20 MW):

2021-11-16 15:26:55,570 DEBUG Getting all gsp in ROI at /home/jack/dev/ocf/nowcasting_dataset/nowcasting_dataset/data_sources/gsp/gsp_data_source.py#L320
2021-11-16 15:26:55,589 DEBUG Found 313 GSP valid data for 2019-08-19 14:40:00 at /home/jack/dev/ocf/nowcasting_dataset/nowcasting_dataset/data_sources/gsp/gsp_data_source.py#L363

Maybe we could modify the log messages? I assume the log message "Dropping 309 GSPs" is wrong? :slightly_smiling_face:

peterdudfield commented 2 years ago

Yea, something is not right - I thought I created a bug for this, but I can't find it.

Will need to check, but I don't think the thresholding is doing anything at the moment. Perhaps the best way forward for the moment, is to

JackKelly commented 2 years ago

Ah, so, I think we do need to filter out any GSPs with literally zero PV. This can be hard coded for now. I agree the config setting can wait until WP2. I think the other github issue is #205

JackKelly commented 2 years ago

Reopening this for now to remind us to implement a config option for threshold_mw in 2022 :slightly_smiling_face:

Please close again if I've misunderstood!