openclimatefix / ocf_datapipes

OCF's DataPipe based dataloader for training and inference
MIT License
13 stars 11 forks source link

PVNet concurrent datapipe #320

Closed dfulu closed 1 month ago

dfulu commented 1 month ago

This new datapipe is for generating batches, where in each batch there are 317 samples for all of the 317 regional GSPs at the same init time t0. This is much faster (at least 10x faster) than our current method of creating similar batches.

This will allow us to create batches for the summation model and to run backtests much faster.

INCLUDED

TODO:

TODO in PVNet library after merging

codecov[bot] commented 1 month ago

Codecov Report

Attention: Patch coverage is 52.28571% with 167 lines in your changes missing coverage. Please review.

Project coverage is 75.24%. Comparing base (d55139e) to head (ab5b649). Report is 129 commits behind head on main.

:exclamation: Current head ab5b649 differs from pull request most recent head e79560b

Please upload reports for the commit e79560b to get more accurate results.

Files Patch % Lines
ocf_datapipes/training/pvnet_all_gsp.py 0.00% 137 Missing :warning:
ocf_datapipes/select/select_spatial_slice.py 84.25% 17 Missing :warning:
ocf_datapipes/select/pick_t0_times.py 61.11% 7 Missing :warning:
ocf_datapipes/load/gsp/utils.py 77.27% 5 Missing :warning:
ocf_datapipes/select/pick_locations.py 96.42% 1 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #320 +/- ## ========================================== - Coverage 75.65% 75.24% -0.42% ========================================== Files 126 128 +2 Lines 5994 6208 +214 ========================================== + Hits 4535 4671 +136 - Misses 1459 1537 +78 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

peterdudfield commented 1 month ago

This new datapipe is for generating batches, where in each batch there are 317 samples for all of the 317 regional GSPs at the same init time t0. This is much faster (at least 10x faster) than our current method of creating similar batches.

This will allow us to create batches for the summation model and to run backtests much faster.

INCLUDED

  • New pipeline for concurrent batches
  • Minor refactoring and linting

TODO:

  • [x] Bug fixes
  • [x] Tests

TODO in PVNet library after merging

  • Use new pipeline within concurrent batch creation script
  • Use new pipeline within UK backtest script

How much is it worth testing this TODO's before merging? Im not sure here

dfulu commented 1 month ago

How much is it worth testing this TODO's before merging? Im not sure here

I've already got it working locally in the concurrent batch creation script in PVNet, so I'm confident enough to merge