openclimatefix / ocf_datapipes

OCF's DataPipe based dataloader for training and inference
MIT License
13 stars 11 forks source link

Pseudo-Irradiance Datapipe Updates #185

Closed jacobbieker closed 1 year ago

jacobbieker commented 1 year ago

Pull Request

Description

Makes some changes to the Pseudo-Irradiance datapipes to speed them up.

Fixes #

How Has This Been Tested?

This was ran for 50 times creating a batch from the example data in this with some results given below with profiling

After running it 50 times in a row, this is the breakdown, creating the sun image is by far the largest single event
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  
                                                   Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg    # of Calls  
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  
                 enumerate(DataPipe)#ZipperIterDataPipe         0.13%     569.522ms       222.75%     1011.363s     777.972ms          1300  
                enumerate(DataPipe)#BatcherIterDataPipe         0.00%      16.264ms        99.89%      453.534s        3.024s           150  
            enumerate(DataPipe)#StackXarrayIterDataPipe         0.06%     252.146ms        98.66%      447.931s        8.959s            50  
         enumerate(DataPipe)#CreateSunImageIterDataPipe        75.59%      343.199s        75.59%      343.201s        6.864s            50  
                     enumerate(DataPipe)#_ChildDataPipe         0.07%     340.487ms        24.25%      110.086s      45.869ms          2400  
       enumerate(DataPipe)#ThreadPoolMapperIterDataPipe         9.35%       42.436s        16.60%       75.387s     376.933ms           200  
          enumerate(DataPipe)#CreatePVImageIterDataPipe         1.19%        5.414s         9.39%       42.636s     426.362ms           100  
              enumerate(DataPipe)#NormalizeIterDataPipe         0.83%        3.786s         7.48%       33.960s      75.467ms           450  
enumerate(DataPipe)#SelectSpatialSlicePixelsIterData...         0.82%        3.731s         7.26%       32.951s      82.378ms           400  
        enumerate(DataPipe)#SelectTimeSliceIterDataPipe         0.12%     524.390ms         6.44%       29.251s      83.574ms           350  
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  
Self CPU time total: 454.026s

Same without CreateSunImage being used, Stack Xarray is somewhat required I think
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  
                                                   Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg    # of Calls  
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  
                 enumerate(DataPipe)#ZipperIterDataPipe         0.55%     814.140ms       285.56%      424.116s     326.243ms          1300  
                enumerate(DataPipe)#BatcherIterDataPipe         0.01%      19.418ms        99.55%      147.851s     985.671ms           150  
            enumerate(DataPipe)#StackXarrayIterDataPipe         0.17%     248.443ms        95.31%      141.558s        2.831s            50  
                     enumerate(DataPipe)#_ChildDataPipe         0.30%     439.869ms        93.55%      138.936s      59.122ms          2350  
       enumerate(DataPipe)#ThreadPoolMapperIterDataPipe        42.37%       62.922s        72.75%      108.043s     540.217ms           200  
          enumerate(DataPipe)#CreatePVImageIterDataPipe         4.35%        6.468s        33.95%       50.418s     504.176ms           100  
enumerate(DataPipe)#SelectSpatialSlicePixelsIterData...         3.48%        5.174s        30.38%       45.122s     112.804ms           400  
              enumerate(DataPipe)#NormalizeIterDataPipe         3.33%        4.939s        26.57%       39.466s      87.703ms           450  
                 enumerate(DataPipe)#MapperIterDataPipe        18.72%       27.802s        23.92%       35.527s     177.633ms           200  
        enumerate(DataPipe)#SelectTimeSliceIterDataPipe         0.47%     692.004ms        22.44%       33.328s      95.223ms           350  
enumerate(DataPipe)#AddT0IdxAndSamplePeriodDurationI...         0.07%      98.052ms        18.34%       27.244s      68.110ms           400  
enumerate(DataPipe)#SelectTrainTestTimePeriodsIterDa...         0.11%     158.366ms        14.45%       21.465s     214.653ms           100  
       enumerate(DataPipe)#OpenPVFromNetCDFIterDataPipe        14.33%       21.287s        14.33%       21.287s     212.866ms           100  
           enumerate(DataPipe)#SelectT0TimeIterDataPipe         0.03%      49.333ms         7.38%       10.966s     109.659ms           100  
      enumerate(DataPipe)#SelectTimePeriodsIterDataPipe         0.39%     581.290ms         7.35%       10.917s     109.166ms           100  
enumerate(DataPipe)#SelectOverlappingTimeSliceIterDa...         1.43%        2.123s         6.82%       10.129s     101.292ms           100  
enumerate(DataPipe)#GetContiguousT0TimePeriodsIterDa...         1.32%        1.964s         5.32%        7.898s      19.744ms           400  
          enumerate(DataPipe)#OpenSatelliteIterDataPipe         2.80%        4.164s         2.80%        4.164s      20.822ms           200  
         enumerate(DataPipe)#SelectChannelsIterDataPipe         0.29%     424.060ms         2.51%        3.733s      18.666ms           200  
enumerate(DataPipe)#CreatePVMetadataImageIterDataPip...         2.07%        3.074s         2.07%        3.082s      61.633ms            50  
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  
Self CPU time total: 148.522s

Checklist:

codecov[bot] commented 1 year ago

Codecov Report

Merging #185 (5c88b4e) into main (c55f5ce) will increase coverage by 4.02%. The diff coverage is 60.00%.

@@            Coverage Diff             @@
##             main     #185      +/-   ##
==========================================
+ Coverage   77.84%   81.86%   +4.02%     
==========================================
  Files         124      124              
  Lines        5097     5102       +5     
==========================================
+ Hits         3968     4177     +209     
+ Misses       1129      925     -204     
Impacted Files Coverage Δ
...f_datapipes/transform/xarray/pv/create_pv_image.py 81.73% <50.00%> (+1.33%) :arrow_up:
...apipes/transform/xarray/pv/create_pv_meta_image.py 75.34% <50.00%> (+2.10%) :arrow_up:
ocf_datapipes/training/pseudo_irradience.py 75.00% <100.00%> (+58.47%) :arrow_up:

... and 10 files with indirect coverage changes

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more