openclimatefix / india-forecast-app

Runs wind and PV forecasts for India and saves to database
MIT License
1 stars 4 forks source link

Windnet: Forecast correct values? #35

Closed peterdudfield closed 4 months ago

peterdudfield commented 6 months ago

Describe the bug

I'm not totalyl convience the forecast is working proerply

Expected behavior

follow the generation values

Additional context

Things done / TODO

jacobbieker commented 6 months ago

The t to t2m might be fine, I'll check that. But the tlc to tcc seems like it might be wrong. As that is setting the total low cloud cover to the total cloud cover

jacobbieker commented 6 months ago

Actually, I believe its because in the training data, the longitude is 74.6399 while in the prod example batch it is 72.6399, which is from me moving it around to align more with where the wind sources are

jacobbieker commented 6 months ago

The variable renaming might also still be an issue, but that is probably less of an effect from this. The field of view of the current model is roughly 320kmx320km, and a 2 degree shift is ~222km shift, so almost completely moves the field of view away from where it was trained.

devsjc commented 6 months ago

See https://github.com/openclimatefix/nwp-consumer/commit/6fa7fde9a658abde20064fd14cf6823c771b0e75 for handling the renaming. Updating the version of the consumer in terraform will propogate this.

jacobbieker commented 6 months ago

This is where the location is pulled from in the database https://github.com/openclimatefix/pv-site-datamodel/blob/1e7c0a2702b5d54197ea62528ffd5450744acf53/pvsite_datamodel/read/site.py#L108 so might need to change that for the windnet to return a different one from the PV site location?

jacobbieker commented 6 months ago

Here is where the code takes the metadata from the database and returns the lat/lon tuples for the dataloader: https://github.com/openclimatefix/india-forecast-app/blob/8d42a3d95cbeb9b6ffbeeaf8dde6681a2291f9ff/india_forecast_app/models/pvnet/model.py#L174C9-L181

jacobbieker commented 6 months ago

So it should just be a simple thing of updating the database to have the new longitude

peterdudfield commented 6 months ago

See openclimatefix/nwp-consumer@6fa7fde for handling the renaming. Updating the version of the consumer in terraform will propogate this.

This been release and deployed?

peterdudfield commented 6 months ago

Actually, I believe its because in the training data, the longitude is 74.6399 while in the prod example batch it is 72.6399, which is from me moving it around to align more with where the wind sources are

And 26.4499 latitude?

peterdudfield commented 6 months ago

Does `time-of-day' go in as a feature?

peterdudfield commented 6 months ago

Could the model be trained with a weighting on the first few hours, to help reduce the error in the earl forecast horizons?

jacobbieker commented 6 months ago

Actually, I believe its because in the training data, the longitude is 74.6399 while in the prod example batch it is 72.6399, which is from me moving it around to align more with where the wind sources are

And 26.4499 latitude?

Yes, that latitude is correct

jacobbieker commented 6 months ago

Does `time-of-day' go in as a feature?

It doesn't for wind explicitly at least, solar it does with the solar elevation and azimuth.

jacobbieker commented 6 months ago

Could the model be trained with a weighting on the first few hours, to help reduce the error in the earl forecast horizons?

Yeah, I think its pretty easy to do that

peterdudfield commented 6 months ago

I had a quick look at the normalized nwp variables and 'prate' stood out abit

This is after normalization

variable max min
hcc 1.4295 -0.9376
lcc -1.1843 -1.1843
mcc 1.7643 -0.8646
prate 9.3231 -0.3168
sde -0.0887 -0.0887
sr 4.4933 0.6274
t2m 4.6202 -1.3150
tcc 0.7871 -1.8804
u10 0.3539 -1.2574
u100 0.4887 -1.5100
u200 0.3803 -1.6375
v10 0.3623 -1.1152
v100 0.6133 -1.4429
v200 0.4583 -1.5155

@jacobbieker @devsjc would you be able to check the units of the prate in the live and in the backtest

Would be great to check the units in general of the backtest ones and the Live data

jacobbieker commented 6 months ago

Yep, I can look at some of the training examples. The prate shouldn't be that high, I don't think. If so, probably should be normalizing a few of those variables by different numbers.

peterdudfield commented 6 months ago

The ECMWF_MEAN and ECMWF_STD I can see in ocf_datapipes. They are taken from the UK dataset. Were these in training the India Windnet? Or where different one used?

peterdudfield commented 6 months ago
jacobbieker commented 6 months ago

Yeah, they are the same ones right now being used, I didn't recalculate for india, forgot to do that

peterdudfield commented 6 months ago

Yeah, they are the same ones right now being used, I didn't recalculate for india, forgot to do that

Thats ok, as long as they are the same for the moment. Might be something for the future though

jacobbieker commented 6 months ago

Backtest lat/lon are in 0.05 degrees increments

peterdudfield commented 6 months ago

Backtest lat/lon are in 0.05 degrees increments

oh interesting, in the live they are 0.1. You reckon we should just interpolate?

jacobbieker commented 6 months ago

Yeah, that probably sounds good, the 0.05 is already an interpolation, as the native model resolution is only 0.1 degrees

peterdudfield commented 6 months ago

ah, how come 0.05 is used in training? not the 0.1 native model?

jacobbieker commented 6 months ago

That's how we saved the data to disk for some reason. @devsjc might have a better idea as to why? But I'm pretty sure the forecast model's native resolution is 0.1 (around ~9km at the equator)

peterdudfield commented 6 months ago

So where does the data change from 0.1 to 0.05? Or is that how it just appears in our backtest data

jacobbieker commented 6 months ago

It is how it appears in our backtest data

peterdudfield commented 6 months ago

could you confirm in each batch what the max and min lat and longs are? @jacobbieker

peterdudfield commented 6 months ago

This looks like this makes a big difference from Screenshot 2024-03-04 at 17 53 21

to Screenshot 2024-03-04 at 17 53 48

jacobbieker commented 6 months ago

This looks like this makes a big difference from Screenshot 2024-03-04 at 17 53 21

to Screenshot 2024-03-04 at 17 53 48

Is that the interpolation?

Here is all the lat/lons:

Latitude: [27.25 27.2  27.15 27.1  27.05 27.   26.95 26.9  26.85 26.8  26.75 26.7
 26.65 26.6  26.55 26.5  26.45 26.4  26.35 26.3  26.25 26.2  26.15 26.1
 26.05 26.   25.95 25.9  25.85 25.8  25.75 25.7 ]
Longitude: [73.85 73.9  73.95 74.   74.05 74.1  74.15 74.2  74.25 74.3  74.35 74.4
 74.45 74.5  74.55 74.6  74.65 74.7  74.75 74.8  74.85 74.9  74.95 75.
 75.05 75.1  75.15 75.2  75.25 75.3  75.35 75.4 ]

Latitude Min/Max: 27.25/25.7 Longitude Min/Max: 73.85/75.4

peterdudfield commented 6 months ago

thank you

peterdudfield commented 6 months ago

Notices a big shift in change in the forecast yesterday

Screenshot 2024-03-06 at 07 36 02 Screenshot 2024-03-06 at 07 35 44

This was becasue we got some new NWP files. Something to investigate I think

peterdudfield commented 6 months ago

@jacobbieker how much time lag do you add to the ECMWF data when training? It looks like in real like we get init runs from 00:00 and 12:00 and they appear about 6 or 7 hours later

jacobbieker commented 6 months ago

Training I think the delay is 3 hours, same as for UKV. So another thing to probably change then to 7 hours?

jacobbieker commented 6 months ago

This https://github.com/openclimatefix/ocf_datapipes/pull/284 will fix the normalization, and rescale the training examples to be 0.1 degrees, if its smaller than that. The config for training has also been updated to use a 7 hour delay. Once that PR is merged, can make new batches with most of the things in this thread fixed then I think.

peterdudfield commented 6 months ago

Ive just been looking at the NWP variable. I can see that the solar variables are cumlative. Is this the same in the training data? @dfulu could you check this

Image

This is a plot of the current NWPs, average over lat/ lon, and how evolve over time

peterdudfield commented 6 months ago

sr also does this

jacobbieker commented 5 months ago

Ah, we should probably then take the difference between each step to have it just be per step. As normalization won't work well for this.

dfulu commented 5 months ago

For future reference I checked this for the UK training and production data and it was the same

peterdudfield commented 5 months ago

Also seen a very similar behaviour when deploying PVnet

peterdudfield commented 5 months ago

What I struggle is at the moment, the NWPs say its windy Screenshot 2024-03-13 at 08 48 04 But the forecasts are low (real generation is high) Screenshot 2024-03-13 at 08 48 10

jacobbieker commented 5 months ago

Hmmm, that might be partly because of the normalization, with the very large numbers for prate, sr, etc. compared to the wind values, the wind values might be getting ignored in the model. I can try training one that only has the wind variables, as they seem to be normalized better in the current dataset? It should force a better correlation between them.

peterdudfield commented 5 months ago

I think that might be a good idea, which variables will you use?

jacobbieker commented 5 months ago

I was thinking just the u and v of the wind speed, as the other ones are not as tied to generation, and the u and v ones seem to be normalized better than the other ones.

peterdudfield commented 5 months ago

@jacobbieker do you mind confirming the max and min lat long, now we are using 0.1 for PV

In production currently they are

jacobbieker commented 5 months ago

The PV ones are nearly the same, have the same lat min and max, lon max is 76.2, lon min 73.1

peterdudfield commented 5 months ago

Ive changed the long in the database from 75.639 to 75.65 which seems to now get the correct longitude

peterdudfield commented 5 months ago

Screenshot 2024-03-20 at 09 51 32 Screenshot 2024-03-20 at 09 51 09

@jacobbieker just looking at the NWP data. Looks like lat=27, lon=70 which is where some of the wind farms is super important data. Could this be included in the batch and ML model?

jacobbieker commented 5 months ago

Yes, it is in the latest batches now. New models will be trained soon with it, once https://github.com/openclimatefix/PVNet/pull/165 gets merged

peterdudfield commented 5 months ago

Another piece of evide that 27, 70 is a useful lcoation that should be included

Screenshot 2024-03-22 at 12 37 02 Screenshot 2024-03-22 at 12 36 43