openclimatefix / nowcasting_dataset

Prepare batches of data for training machine learning solar electricity nowcasting data
https://nowcasting-dataset.readthedocs.io/en/stable/
MIT License
25 stars 6 forks source link

Method to drop 'padded' pv systems #369

Open peterdudfield opened 2 years ago

peterdudfield commented 2 years ago

Detailed Description

Would be useful to have a method to drop 'padded' pv systems. These are padding out zero so that the dataset can be save in an efficient way. However for plotting its useful to drop the 'padded' systems

Would be also good to so this for 'gsp' too

Possible Implementation

in from nowcasting_dataset.data_sources.pv.pv_data_source import PV have function that removes any zero values. This would be for 'data', 'pv_systems', 'x_coords' and 'y_coords'

JackKelly commented 2 years ago

Interesting!

I'm probably remembering wrong but I thought the code padded PV data with NaNs, not zeros?

Is it possible that the zeros are "legitimate"? (e.g. zero power generation at night?)

peterdudfield commented 2 years ago

Yea, got to be careful that the zeros are not true. Could be done by looking at xcoords as 0 is not relastic there. another option is to change them back to nans, and then do some filling before passing to ML models