umr-amap / StormR

Funtions to quantify and map the tropical storms and cyclones’ winds characteristics'
https://umr-amap.github.io/StormR/
GNU General Public License v3.0
13 stars 4 forks source link

[New Feature]: Improve computation efficiency #69

Open thomasarsouze opened 9 months ago

thomasarsouze commented 9 months ago

Describe your feature request

spatialBehaviour and temporalBehaviour are kind of slow to run. For exemple, it takes 37.5s on my computer (mac osx 2020, 2 GHz Intel Core i5 four cores) to run the spatialBehaviour function for only 7 storms:

ds <- defStormsDataset()
s <- defStormsList(ds)
s <- defStormsList(ds, loi="New Caledonia")
=== Storms processing ... ===

-> Making buffer: Done
-> Searching storms from 2015 to 2021 ...
   -> Identifying Storms: 9 potential candidates...
-> Gathering storm(s) ... 
  |===================================================================================================================================| 100%

=== DONE with run time 1.182287 sec ===

SUMMARY:
(*) LOI: New Caledonia 
(*) Buffer size: 300 km
(*) Remove Tropical Depressions (< 18 m/s in sshs): yes
(*) Number of storms: 7 
        Name - Tropical season - SSHS - Number of observation within buffer:
        PAM - 2015 - 5 - 5 
        SOLO - 2015 - 0 - 15 
        ULA - 2016 - 4 - 9 
        UESI - 2020 - 1 - 16 
        GRETEL - 2020 - 1 - 11 
        LUCAS - 2021 - 1 - 16 
        NIRAN - 2021 - 5 - 10 

msw <- spatialBehaviour(s)

We want the code to be faster.

Describe the solution you'd like

Three things must be considered to improve efficency when doing the computation of storms behaviour:


  1. We must use a profiling library to analyse the time spent in different functions:
    library(profvis)
    profvis(spatialBehaviour(s))

    First analysis with the exemple above shows that :

    • overall time in this function is 35170ms
    • computational time in spatialBehaviour itself is 5800ms
    • all the remaining time is spent in .External functions, which are tipicaly sf functions.

This means that we should consider means to lower the number of calls to sf functions.

  1. Code in Rcpp the computing functions, including:

    • computeAsymmetry
    • willoughby
    • holland
    • boose
    • computeDirectionBoose
    • computeDirection although as stated above, this is limited to a small fraction of the total computational time
  2. Test the parallel-looping when available:

    • auto detect number of cores
    • implement parallel-looping (keeping order) using foreach library when relevent (and possible)

Describe alternatives you've considered

Additional context

No response

thomasarsouze commented 5 months ago

Some news about Rcpp implementation.

In branch 69-new-feature-improve-computation-efficiency, I have implemented all computational functions.

Here is the benchmark of the implmentations:

##willoughby
[ins] r$> rbenchmark::benchmark(willoughby(r, rmw, msw, lat), 
                                willoughby_cpp(r, rmw, msw, lat))[,1:4]
                              test replications elapsed relative
2 willoughby_cpp(r, rmw, msw, lat)          100   0.042    1.000
1     willoughby(r, rmw, msw, lat)          100   0.271    6.452

##holland
[ins] r$> rbenchmark::benchmark(holland(r, rmw, msw, pc, poci, lat), 
                                holland_cpp(r, rmw, msw, pc, poci, lat))[,1:4]
                                     test replications elapsed relative
2 holland_cpp(r, rmw, msw, pc, poci, lat)          100   0.078    1.000
1     holland(r, rmw, msw, pc, poci, lat)          100   0.251    3.218

##boose
[ins] r$> rbenchmark::benchmark(boose(r, rmw, msw, pc, poci, x, y, vx, vy, vh, landIntersect, lat), 
                                boose_cpp(r, rmw, msw, pc, poci, x, y, vx, vy, vh, landIntersect, lat))[,1:4]
                                                                    test replications elapsed relative
2 boose_cpp(r, rmw, msw, pc, poci, x, y, vx, vy, vh, landIntersect, lat)          100   0.131     1.00
1     boose(r, rmw, msw, pc, poci, x, y, vx, vy, vh, landIntersect, lat)          100   0.410     3.13

##computeDirectionBoose
[ins] r$> rbenchmark::benchmark(computeDirectionBoose(x,y,lat, landIntersect),
                                computeDirectionBoose_cpp(x,y,lat, landIntersect))[,1:4]
                                                 test replications elapsed relative
2 computeDirectionBoose_cpp(x, y, lat, landIntersect)          100   0.058    1.000
1     computeDirectionBoose(x, y, lat, landIntersect)          100   0.217    3.741

##computeDirection
[ins] r$> rbenchmark::benchmark(computeDirection(x,y,lat),
                                computeDirection_cpp(x,y,lat))[,1:4]
                             test replications elapsed relative
2 computeDirection_cpp(x, y, lat)          100   0.041    1.000
1     computeDirection(x, y, lat)          100   0.130    3.171

##computeAsymetry
[ins] r$> benchmark(computeAsymmetry("Chen", wind, direction, x, y, vx, vy, vh, r, rmw, lat), 
                    computeAsymmetry_cpp("Chen", wind, x, y, vx, vy, vh, r, rmw, lat))[,1:4]
                                                                      test  replications elapsed relative
2        computeAsymmetry_cpp("Chen", wind, x, y, vx, vy, vh, r, rmw, lat)           100   0.269    1.000
1 computeAsymmetry("Chen", wind, direction, x, y, vx, vy, vh, r, rmw, lat)           100   0.657    2.442

##computeExposure
[ins] r$> benchmark(computeExposure(wind, tempRes, windThreshold), 
                    computeExposure_cpp(wind, tempRes, windThreshold))[,1:4]
                                               test replications elapsed relative
2 computeExposure_cpp(wind, tempRes, windThreshold)          100   0.014    1.000
1     computeExposure(wind, tempRes, windThreshold)          100   0.265   18.929

##computePDI
[ins] r$> rbenchmark::benchmark(computePDI(wind, tempRes), 
                    computePDI_cpp(wind, tempRes))[,1:4]
                           test replications elapsed relative
2 computePDI_cpp(wind, tempRes)          100   0.013    1.000
1     computePDI(wind, tempRes)          100   0.097    7.462

I also added some RcppArmadillo test implementation, but it is systematicaly worth than regular Rcpp, even with multiple if conditions inside loops.

This is a first implementation, I did some tests, but this surely can be improved.

rcpp_tests.R file has all the steps to reproduce the benchmarks.

thomasarsouze commented 5 months ago

@BaptisteDlp branch issue-optim-2 freezes my terminal. Do I have to do anything special with installation ? Also, it would be better to use environment variable that we can set before the computation step to determine how many cores we use.