openclimatefix / open-source-quartz-solar-forecast

Open Source Solar Site Level Forecast
MIT License
69 stars 55 forks source link

Challenge: new model #30

Open peterdudfield opened 11 months ago

peterdudfield commented 11 months ago

Can you make a new model and beat the current evaluations metrics?

You need to build a forecasts to forecast PV. The PV dataset is all here, and we also want to model to run like the current model i.e pulling NWP data from open-meteo.

We need a model that can forecast 48 hours ahead, in 15 minute intervals. We want it to run live without PV live data, but an good optional extra would be to include PV data.

This is fairly open ended on in order to not restrict anyone.

peterdudfield commented 8 months ago

I good think to do firstly, would be to build a general pipeline that takes weather data joins pv data tgoether. It might be a case of writing this fresh, or using ocf_datapipes

shreyasudaya commented 8 months ago

Sorry to comment here, but I would like to ask about whether this paper is relevant to the related issue. https://www.sciencedirect.com/science/article/pii/S0960148123009035#tbl1

shreyasudaya commented 8 months ago

Sorry to comment here, but I would like to ask about whether this paper is relevant to the related issue. https://www.sciencedirect.com/science/article/pii/S0960148123009035#tbl1

roshnaeem commented 8 months ago

Hello @peterdudfield , i would like to work on this issue. Can you please assign me?

peterdudfield commented 8 months ago

Hi @roshnaeem If its ok, I'll keep the assignees so that it enourages lots of people to tackle this issue. Is that ok? Thank you so much on working on this, please write here, if you have any questions

peterdudfield commented 8 months ago

Some general questions:

  1. Where can i read about psp library which is being used in the project ?

https://github.com/openclimatefix/pv-site-prediction, but I wouldnt get too stuck into this code. I think it would be better to write something freiends

  1. Through readme and code, i could understand that we are using Gradient boosted trees model and it is being called to make predictions through run_forecast function. Can i see the code for the model?

See above

  1. For the next models, which we would be adding to the project, should the parameters for prediction be same i.e, PVSite(latitude, longitude, capacity_kwp) and timestamp as the current model?

Yea, but also the NWPs are going to be very important

  1. How can i check the accuracy and other bench marks for the current model being used?

use the evaulation script - https://github.com/openclimatefix/Open-Source-Quartz-Solar-Forecast/blob/main/scripts/run_evaluation.py

roshnaeem commented 8 months ago

Sure @peterdudfield, thank you, i am checking the code, and will open a PR soon.

roshnaeem commented 8 months ago

Thank you @peterdudfield for your guidance. I went through pv-site-prediction and ocf-datapipes repositories to understand the basics. I have a couple of questions.

  1. The current model is using the combination of NWP data and PV site data to train the model, right? For the next model, what does "We want it to run live without PV live data" mean?
  2. ocf-datapipes is integrating both types of data, right? Are we using it to provide training data?
  3. Regarding your comment for the first good step, can you explain to me which part of the data preprocessing I should work on, that can be a good PR for the GSOC proposal? I see there are two approaches mentioned. Can you please tell me the step-by-step approach to handle this subtask?
peterdudfield commented 8 months ago
  1. No live PV data means the model can run inference with only NWP data. This is what we have found lots of people want.

  2. If you want to use it yes, currerntly its not being used in the repo

  3. I'm not sure what you mean by two approaches? Could you clarify? I'm not sure i can tell you a step by step approach, but I can try to outline things

roshnaeem commented 8 months ago

I good think to do firstly, would be to build a general pipeline that takes weather data joins pv data tgoether. It might be a case of writing this fresh, or using ocf_datapipes

@peterdudfield I was talking about these two approaches you mentioned in this comment.

peterdudfield commented 8 months ago

I good think to do firstly, would be to build a general pipeline that takes weather data joins pv data tgoether. It might be a case of writing this fresh, or using ocf_datapipes

@peterdudfield I was talking about these two approaches you mentioned in this comment.

I'd probably try ocf_datapipes first, and if it doesnt suit, then try to write something fresh

roshnaeem commented 8 months ago

@peterdudfield, I have a few questions regarding the GSOC proposal.

  1. Would we be using ocf_datapipes as well as building new datapipes for the new model?
  2. Should the current model also work on these data_pipes?
  3. If we run the inference only with NWP data, would we be using standard capacity for PV systems?
peterdudfield commented 8 months ago

I would leave the current model how it is, but aim to use ocf_datapipes for the new model.

  1. Capacity is a useful feature as you can have the same NWP conditions you can have different PV power depending on the capacity
peterdudfield commented 8 months ago

Question ** How can we use only nwp data to predict, we would need capacity and pv site data to get the nwp data. Does live PV data means that we would be getting pv data in real time and predicting the generation in real time?


yea, it would be good to use pv metadata data, like capacity and nwp data in the model. The live PV data would also increase the accuracy of the model, but we've tried in this repo to have that as optional. So first of all the model works with NWP and PV metadata

felipewhitaker commented 8 months ago

Hi,

I am working on #27 and this discussion helped a lot, thanks!

I explored the project and ended up in psp, since it contains the code to train the models. I ran its train and eval model after setting up the environment, but wasn't able to use its result (.pkl) directly as a model (substituting the current default model in forecast_v1 by psp's test_config1 model .pkl). Should using psp's model directly be possible?

BayoHabib commented 8 months ago

Hi @peterdudfield I would like to work on this issue. I'll be available from march 28.

Fofoabdo commented 7 months ago

can i work on this issue ?