Adding nuclear production forecast

pkautio commented 4 months ago

Couple of ideas to improve the forecast:

1) Available transit capacity between FI-SE1, FI-SE3 and FI-EE has major impact to the prices at certain conditions

This data should be available from Entso-E as market messages. With Entso-E-py package:

from entsoe import EntsoePandasClient import pandas as pd

client = EntsoePandasClient(api_key="")

start = pd.Timestamp('202300101', tz='Europe/Helsinki') end = pd.Timestamp('20241231', tz='Europe/Helsinki') country_code = 'FI' # Finland country_code_from = 'FI' # Finland country_code_to = 'SE_1' # Finland-Northern Sweden

transit_unavailability = client.query_unavailability_transmission(country_code_from, country_code_to, start=start, end=end, docstatus=None, periodstartupdate=None, periodendupdate=None)

Similary query for all connections and directions.

2) Nuclear power capacity forecast

This could be done based on UMM Remit messages. These should be available from Entso-E.

Ensto-E-py package provides ready-made interface to Entso-E.

vividfog commented 4 months ago

Couple of ideas to improve the forecast:

Available transit capacity between FI-SE1, FI-SE3 and FI-EE has major impact to the prices at certain conditions

This data should be available from Entso-E as market messages. With Entso-E-py package:

from entsoe import EntsoePandasClient import pandas as pd

client = EntsoePandasClient(api_key="")

start = pd.Timestamp('202300101', tz='Europe/Helsinki') end = pd.Timestamp('20241231', tz='Europe/Helsinki') country_code = 'FI' # Finland country_code_from = 'FI' # Finland country_code_to = 'SE_1' # Finland-Northern Sweden

transit_unavailability = client.query_unavailability_transmission(country_code_from, country_code_to, start=start, end=end, docstatus=None, periodstartupdate=None, periodendupdate=None)

Similary query for all connections and directions.

Nuclear power capacity forecast

This could be done based on UMM Remit messages. These should be available from Entso-E.

Ensto-E-py package provides ready-made interface to Entso-E.

I updated README.md to tell more about how to add a new input data source to the prediction pipeline.

For these two:

Transit lines would be new columns. Update the DB schema, backfill the data, create a utility function to infer the near future data, when the prediction pipeline runs. Using the "last known" value likely reaps most of the effecfs, as these inputs don't change as quickly and as often as, say, the weather.
Nuclear availability prediction based on UMM messages would be an updated or alternative fingrid.py routine. Refactor the util/fingrid.py module in such a way that instead of assuming last-known-good value is the future value, factor in the effect of UMM Remit messages. A potential pitfall here is to deduce, which of the recent-past real production numbers already contain the effect of the new message(s). So that we don't add the effect twice! Example: If a message says there's going to be a 1 GW reduction starting "yesterday" ... how do we know it actually did start yesterday, and the realised numbers therefore already contain the effect of this message? What if the planned reduction was a bit late, it hasn't actually started yet, and the UMM messages are now a stack of messages, akin to a changelog or a commit sequence?

Sanity pre-checks might include:

How big is the effect of the transit lines? No way to know other than by trying. The challenge is that there's not much past data for the model to learn what the price effect may be. Transit lines haven't been down that much.
Assume we already had accurate non-fragile way to include UMM messages. What's the effect? The current logic waits for the change to actually happen, and then assumes that things will be like this until the production numbers go up again. In practice this leads to a 6-24 hour period where the predictions are off, as the change is happening either down or up. But then the predictions self-correct, as the new ground truth becomes part of the input. Is the added complexity worth the potential improvement during these change periods?

This makes me personally a bit conservative about adding these as input factors, but I very much welcome the efforts to hack with these ideas and see how they behave with real data.

Here's how:

https://github.com/vividfog/nordpool-predict-fi?tab=readme-ov-file#adding-a-new-data-source

sjksp commented 4 months ago

Data from Fingrid doesn't show nuclear availability, it shows realized nuclear production.

Realized nuclear production consists of two factors: a) Is the plant technically able to produce? b) Is anyone interested in purchasing the produced power at the seller's desired price? UMM answers a, realized production answers b.

I imagine the model would do better fed with a, and b is quite possibly entirely redundant.

UMMs can overlap, but in general the "worst" UMM overrides any other message regarding the same asset.

pkautio commented 4 months ago

a) Planned maintance data is available from Entso-E. I prepared python code to gather this data and convert that to per-hour forecast time series for next 5 days. The code is almost ready and can be added to this project soon.

b) Nuclear plants are generally always producing electricity with the exception of corner cases when the spot price is negative. For the forecast it should not matter.

vividfog commented 4 months ago

a) Planned maintance data is available from Entso-E. I prepared python code to gather this data and convert that to per-hour forecast time series for next 5 days. The code is almost ready and can be added to this project soon.

b) Nuclear plants are generally always producing electricity with the exception of corner cases when the spot price is negative. For the forecast it should not matter.

This sounds great.

How are you handling the edge cases where there's planned downtime and it has already started but not in full? Gradient curve vs realized data. Same when going up again?

Or if it didn't start in schedule? Or it started early? Or the UMM came late, sudden failure. Updated UMMs as things clear up. Again, merging the info with the realized data as the gradient is happening or is early or late?

What kind of real world impact it has if the gradient and edge cases are ignored? Would it still result in a better prediction than a naive extrapolation of "last known value" .. on average. Or a worse prediction? Under what scenarios?

Overall, interesting to see how you resolve the touchpoint between realized vs predicted nuclear MW. Thanks for taking the challenge.

pkautio commented 4 months ago

Forecast is forecast. It's based on market messages for planned maintenance. Forecast includes hours 5 days forward (time series) from current time forward.

Realised production is realised production. For price forecast it does not matter if the planned maintenance started few hours late, since the price has already been fixed. Of course this produces incorrect training data for the future forecasts for individual hours.

pkautio commented 4 months ago

Added nuclear forecast script and opened pull request. You will need Entso-E API key to use the script.

Script fetches Planned Unavailability data of Finnish nuclear plants from Entso-E API and modifies the data to capacity forecast time series.

vividfog commented 4 months ago

Thanks a lot. It's work week again, so review might be pushed towards the end of the week. I will come back with questions if needed.

vividfog commented 4 months ago

A quick comment. I saw this generates the forecast and it's straightforward. Excellent. If you feel like it, you can include the code that integrates this to the end to end forecast. Or I can when I get to it.

It looks like my existing nuclear function could by default call this new function, instead of what it does today. But retain support for the old way for a while. That would enable end to end testing to see an A/B comparison with and without unavailability data. For some ML stats.

Or did you already have a view on how you'd like to integrate this into the pipeline? Reading this on mobile currently, sorry if I missed any existing notes on that. @pkautio

vividfog commented 4 months ago

ENTSO-E code is now integrated, README updated, and the next prediction will use market messages in a few hours. Hats off to @pkautio for figuring out the ENTSO-E part 👍 ... and if the forecasts are off the wall tomorrow, that my fault. At the time of writing this, it all worked.

vividfog commented 4 months ago

Sample run:

python nordpool_predict_fi.py --train --predict
[2024-03-07 23:12:34] Nordpool Predict FI
Training a new model candidate using the data in the database...
* FMI Weather Stations for Wind: ['ws_101673', 'ws_101256', 'ws_101846', 'ws_101267']
* FMI Weather Stations for Temperature: ['t_101786', 't_101118', 't_100968', 't_101339']
→ Feature Importance:
       Feature  Importance
      t_101339    0.211443
     ws_101256    0.181828
      t_100968    0.162449
NuclearPowerMW    0.106445
      t_101786    0.066850
          hour    0.062656
     ws_101673    0.047487
   day_of_week    0.042330
     ws_101846    0.040911
      t_101118    0.034386
         month    0.027271
     ws_101267    0.015943
→ Durbin-Watson autocorrelation test: 2.00
→ ACF values for the first 5 lags:
  Lag 1: 1.0000
  Lag 2: -0.0014
  Lag 3: -0.0237
  Lag 4: -0.0202
  Lag 5: -0.0028
  Lag 6: -0.0080
→ Model trained:
  MAE (vs test set): 1.7806310108575483
  MSE (vs test set): 17.125934255294478
  R² (vs test set): 0.8378969382433199
  MAE (vs 10x500 randoms): 1.2451800318949335
  MSE (vs 10x500 randoms): 15.15771097880135
  R² (vs 10x500 randoms): 0.8548413431241844
→ Model NOT saved to the database but remains available in memory for --prediction.
→ Training done.
Running predictions...
* Fetching wind speed forecast and historical data between 2024-02-29 and 2024-03-12
* Fetching temperature forecast and historical data between 2024-02-29 and 2024-03-12
* Fetching nuclear power production data between 2024-02-29 and 2024-03-12 and inferring missing values
* Fingrid: Fetched 2648 hours, aggregated to 133 hourly averages spanning from 2024-02-29 to 2024-03-05
→ Fingrid: Using last known nuclear power production value: 2764 MW
* ENTSO-E: Fetching nuclear downtime messages...
→ ENTSO-E: Avg: 2772, max: 2772, min: 2772 MW
* Fetching electricity price data between 2024-02-29 and 2024-03-12
→ Days of data coverage (should be 7 back, 5 forward for now):  12
→ Found a newly created in-memory model for predictions
                    Timestamp  PricePredict_cpkWh  ws_101256  ws_101267  ws_101673  ws_101846  t_101118  t_101339  t_101786  t_100968  NuclearPowerMW  Price_cpkWh
0   2024-02-29 23:00:00+00:00            0.186325       14.2       13.1       11.6       11.2      0.49      0.38      1.57      0.80        4249.545       0.0000
1   2024-03-01 00:00:00+00:00            0.268877       13.8       12.4       11.6       11.3      0.31      0.43      1.62      0.67        4228.760       0.0000
2   2024-03-01 01:00:00+00:00            0.239165       13.5       11.4       11.3       11.1      0.55      0.35      1.70      0.58        4228.825       0.0000
3   2024-03-01 02:00:00+00:00            0.338605       13.4       10.2       11.1       10.9      0.60      0.31      1.60      0.34        4229.235       0.0000
4   2024-03-01 03:00:00+00:00            0.729866       13.0       10.0       10.9       10.4      0.62      0.28      1.55      0.01        4228.350       0.0012
..                        ...                 ...        ...        ...        ...        ...       ...       ...       ...       ...             ...          ...
283 2024-03-12 18:00:00+00:00           14.795136        2.3        2.1        4.8        2.0     -4.70     -7.11     -4.98     -5.57        2772.000          NaN
284 2024-03-12 19:00:00+00:00           12.084022        2.2        2.1        4.7        2.1     -5.29     -7.53     -5.28     -6.27        2772.000          NaN
285 2024-03-12 20:00:00+00:00           12.363734        2.0        2.1        4.7        2.3     -5.87     -7.94     -5.59     -6.97        2772.000          NaN
286 2024-03-12 21:00:00+00:00           10.250964        1.9        2.4        4.5        2.5     -4.83     -6.69     -8.43     -4.51        2772.000          NaN
287 2024-03-12 22:00:00+00:00            8.946699        1.8        2.5        4.4        2.6     -5.52     -7.10     -8.61     -5.14        2772.000          NaN

[288 rows x 12 columns]
* Predictions NOT committed to the database (no --commit).

vividfog / nordpool-predict-fi

Adding nuclear production forecast #7