openclimatefix / uk-pv-national-xg

National PV forecasting using Gradient Boosted Methods.
4 stars 3 forks source link

New features #45

Closed peterdudfield closed 11 months ago

peterdudfield commented 1 year ago

Would be interested to think what people think I should do first? @JackKelly @jacobbieker @dantravers

jacobbieker commented 1 year ago

I would probably start with removing the sde from training, and then probably more lag features? I think XGBoost models don't need the data to be normalized, so not sure that's necessary, although I guess if the units are different between CEDA and live MetOffice it probably makes sense to do that first.

peterdudfield commented 1 year ago

Bonus one is to add mcc and hcc to nwp variables

JackKelly commented 1 year ago

if the units are different between CEDA and live MetOffice it probably makes sense to do that first

Yeah, it's pretty essential that the data the model sees at inference time is exactly the same as the data it sees at training time :slightly_smiling_face: so I agree that sounds like the priority!

And I agree with @jacobbieker that I don't think XGBoost models require the data to be normalised (because it chops real-valued inputs up into bins).

Does the model also get historical NWP data? If not, I think that might help a bit: i.e. if the model gets lagged GSP data for n timesteps in the past, then it might be useful to give the model NWP data for those same timesteps so the model can see the difference between the expected forecast (given the NWP) and what actually happened in the recent past. But maybe the model is already doing that?

peterdudfield commented 1 year ago

Thanks @JackKelly and @jacobbieker , i re-ordered above, do you that order is about right?

JackKelly commented 1 year ago

Lgtm!

jacobbieker commented 1 year ago

Looks great!

peterdudfield commented 1 year ago

Thanks, @dantravers you happy with this?

dantravers commented 1 year ago

Looks reasonable to me! I'd be curious to see if this does well, so could be higher? Use historic NWP data, not just forecasts But seems sensible. Thanks for asking the open question!