openclimatefix / open-source-quartz-solar-forecast

Open Source Solar Site Level Forecast
MIT License
60 stars 51 forks source link

Tryolabs Model is large at 1.13Gb #126

Closed zakwatts closed 2 weeks ago

zakwatts commented 3 months ago

Currently the Tryolabs model to download is 1.13Gb

Downloading model ... Downloading... From (original): https://drive.google.com/uc?id=1O34gyQ67rvrP9VFkNaagTDM9IP4iqAjM From (redirected): https://drive.google.com/uc?id=1O34gyQ67rvrP9VFkNaagTDM9IP4iqAjM&confirm=t&uuid=48065f82-5d7e-49e2-ac5c-095c7a17b40d To: [/home/zak/projects/Open-Source-Quartz-Solar-Forecast/examples/model_10_202405.ubj.zip](https://vscode-remote+ssh-002dremote-002bzak-002dresearch.vscode-resource.vscode-cdn.net/home/zak/projects/Open-Source-Quartz-Solar-Forecast/examples/model_10_202405.ubj.zip) 100%|██████████| 1.13G/1.13G [00:13<00:00, 81.5MB/s] Preparing model ... Loading model ... Predictions finished.

This seems large and perhaps it could be optimised to only download whats needed. For reference the gradient boosted model is around 400kb

froukje commented 3 months ago

Yes, we are aware that it is larger than the GB. For now we zipped it to store it, but when it is downloaded it needs to be unzipped.

peterdudfield commented 3 months ago

is there a way to reduce the size? Perhaps more than just the weights are saved?

zakwatts commented 3 months ago

Screenshot 2024-06-05 at 16 21 54 It does slow down the CI pytest as it needs to download the model to run.

froukje commented 3 months ago

We discussed this issue and we already tried to make the model as small as possible by storing it as .ubj (universal binary json). We think the reason is simply that the model has more parameters. The model is quite large using a lot of trees. I understand that the demoration of the tests is annoying. For the application itself it should not be such a problem, because the downloading and unzipping only happens the first time you use it. What do you think if we remove the xgboost model from the tests? Or maybe better, find another solution for the tests? Just looking at the tests again, I realized that in test_forecast.py the xgboost model is not called. If we want to include it here we need to change predications_df_xgb = run_forecast(site=site, ts=ts) to predications_df_xgb = run_forecast(site=site, model="xgb", ts=ts). Let me know, if it is enough for you to solve this issue with the tests changed.

peterdudfield commented 2 weeks ago

Thanks @froukje, ill close this for the moment.