Should we smooth the xgboost model

openclimatefix / open-source-quartz-solar-forecast

Open Source Solar Site Level Forecast

MIT License

69 stars 55 forks source link

Should we smooth the xgboost model #124

Open peterdudfield opened 5 months ago

peterdudfield commented 5 months ago

The current xgboost model is quite spike. This is likely due to the ML learning model

It might be worth smooething this? And we probably need to make sure we smooth this before the night time filter

You can see this here Screenshot 2024-05-30 at 15 31 35

Plomo-02 commented 5 months ago

hello, can I be assigned to this issue?

froukje commented 5 months ago

We have been looking into this issue and we think that the reason why the predictions are so spiky is because we download input data for every 15 minutes and use these for predictions. Using hourly data results in much smoother plots. Here is an example. I'm not sure if smoothing would be an appropriate approach here or, when you want the plots/results less spiky we can simply use lower frequency data. The model might not be optimal for this higher frequency as it was trained on hourly data.

Screenshot from 2024-05-31 13-27-04

peterdudfield commented 5 months ago

Hi @froukje

If it was trained on hourly data, should probably use hourly data in inference as well. I would probably go for that fix first. Are you able to make a PR for this? Thanks

peterdudfield commented 5 months ago

hello, can I be assigned to this issue?

Thanks @Plomo-02, its probably best @froukje has a go at this first.

froukje commented 5 months ago

Yes, sure. No problem.

froukje commented 5 months ago

This issue can be closed. The predictions have been changed to hourly data.

aryanbhosale commented 2 months ago

Can kalman filters be used just before plotting? Even with the 15 min data, it would smoothen the curve - or even PID algorithm, it would minimize the large spikes caused by any noise in the pv