openclimatefix / ocf_datapipes

OCF's DataPipe based dataloader for training and inference
MIT License
13 stars 11 forks source link

Fill select #199

Closed dfulu closed 1 year ago

dfulu commented 1 year ago

Pull Request

Description

This pull request has two components

  1. Currently when the select_time_slice() function is used to slice a time period of data which stretches beyond the bounds of the inout data it will just select the overlapping region.

    i.e. input data has timestamps [t0, t1, t2, t3] and we ask for slice (t2, t5) then the function returns data with timestamps [t2, t3].

    This pull request adds an optional parameter fill_selection to the function. When set to true it will return [t2, t3, t4, t5] for the requested slice above, with the data at time indices [t4, t5] set to NaNs.

    This feature could be useful in production so that the inputs to models are always the same shape when data is missing. We can then deal with NaN inputs within other datapipe sections or in the model itself.

  2. PVNet production pipeline using 1.

Also, tests covering 1 and 2

Checklist:

codecov[bot] commented 1 year ago

Codecov Report

Merging #199 (a2bf1ce) into main (39ea02b) will increase coverage by 0.04%. The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main     #199      +/-   ##
==========================================
+ Coverage   81.07%   81.12%   +0.04%     
==========================================
  Files         128      128              
  Lines        5454     5463       +9     
==========================================
+ Hits         4422     4432      +10     
+ Misses       1032     1031       -1     
Impacted Files Coverage Δ
ocf_datapipes/select/select_time_slice.py 100.00% <100.00%> (ø)
ocf_datapipes/training/pvnet.py 79.19% <100.00%> (+0.14%) :arrow_up:

... and 1 file with indirect coverage changes

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more