[New Feature]: Improve computation efficiency

Describe your feature request

spatialBehaviour and temporalBehaviour are kind of slow to run. For exemple, it takes 37.5s on my computer (mac osx 2020, 2 GHz Intel Core i5 four cores) to run the spatialBehaviour function for only 7 storms:

ds <- defStormsDataset()
s <- defStormsList(ds)
s <- defStormsList(ds, loi="New Caledonia")
=== Storms processing ... ===

-> Making buffer: Done
-> Searching storms from 2015 to 2021 ...
   -> Identifying Storms: 9 potential candidates...
-> Gathering storm(s) ... 
  |===================================================================================================================================| 100%

=== DONE with run time 1.182287 sec ===

SUMMARY:
(*) LOI: New Caledonia 
(*) Buffer size: 300 km
(*) Remove Tropical Depressions (< 18 m/s in sshs): yes
(*) Number of storms: 7 
        Name - Tropical season - SSHS - Number of observation within buffer:
        PAM - 2015 - 5 - 5 
        SOLO - 2015 - 0 - 15 
        ULA - 2016 - 4 - 9 
        UESI - 2020 - 1 - 16 
        GRETEL - 2020 - 1 - 11 
        LUCAS - 2021 - 1 - 16 
        NIRAN - 2021 - 5 - 10 

msw <- spatialBehaviour(s)

We want the code to be faster.

Describe the solution you'd like

Three things must be considered to improve efficency when doing the computation of storms behaviour:

analyze bottlenecks and find where spatialBehaviour spends most of its time
implement the computing functions in C++ using Rcpp library
implement parallel looping

We must use a profiling library to analyse the time spent in different functions:
```
library(profvis)
profvis(spatialBehaviour(s))
```
First analysis with the exemple above shows that :
- overall time in this function is 35170ms
- computational time in spatialBehaviour itself is 5800ms
- all the remaining time is spent in .External functions, which are tipicaly sf functions.

This means that we should consider means to lower the number of calls to sf functions.

Code in Rcpp the computing functions, including:
- computeAsymmetry
- willoughby
- holland
- boose
- computeDirectionBoose
- computeDirection although as stated above, this is limited to a small fraction of the total computational time
Test the parallel-looping when available:
- auto detect number of cores
- implement parallel-looping (keeping order) using foreach library when relevent (and possible)

Describe alternatives you've considered

[ ] reorganize code and sf functions calls
[ ] code computing functions in Rcpp
[ ] test parallel-looping using foreach library

Additional context

No response

Some news about Rcpp implementation.

In branch 69-new-feature-improve-computation-efficiency, I have implemented all computational functions.

First we need a R/StormR-package.R file that includes:

#' @useDynLib StormR , .registration=TRUE
#' @importFrom Rcpp evalCpp
NULL

Then we need to document the package so that the NAMESPACE file gets updated:
```
devtools::document()
```
Now we can compile the package and access the Rcpp functions in src directory:
```
devtools::load_all()
```

Here is the benchmark of the implmentations:

##willoughby
[ins] r$> rbenchmark::benchmark(willoughby(r, rmw, msw, lat), 
                                willoughby_cpp(r, rmw, msw, lat))[,1:4]
                              test replications elapsed relative
2 willoughby_cpp(r, rmw, msw, lat)          100   0.042    1.000
1     willoughby(r, rmw, msw, lat)          100   0.271    6.452

##holland
[ins] r$> rbenchmark::benchmark(holland(r, rmw, msw, pc, poci, lat), 
                                holland_cpp(r, rmw, msw, pc, poci, lat))[,1:4]
                                     test replications elapsed relative
2 holland_cpp(r, rmw, msw, pc, poci, lat)          100   0.078    1.000
1     holland(r, rmw, msw, pc, poci, lat)          100   0.251    3.218

##boose
[ins] r$> rbenchmark::benchmark(boose(r, rmw, msw, pc, poci, x, y, vx, vy, vh, landIntersect, lat), 
                                boose_cpp(r, rmw, msw, pc, poci, x, y, vx, vy, vh, landIntersect, lat))[,1:4]
                                                                    test replications elapsed relative
2 boose_cpp(r, rmw, msw, pc, poci, x, y, vx, vy, vh, landIntersect, lat)          100   0.131     1.00
1     boose(r, rmw, msw, pc, poci, x, y, vx, vy, vh, landIntersect, lat)          100   0.410     3.13

##computeDirectionBoose
[ins] r$> rbenchmark::benchmark(computeDirectionBoose(x,y,lat, landIntersect),
                                computeDirectionBoose_cpp(x,y,lat, landIntersect))[,1:4]
                                                 test replications elapsed relative
2 computeDirectionBoose_cpp(x, y, lat, landIntersect)          100   0.058    1.000
1     computeDirectionBoose(x, y, lat, landIntersect)          100   0.217    3.741

##computeDirection
[ins] r$> rbenchmark::benchmark(computeDirection(x,y,lat),
                                computeDirection_cpp(x,y,lat))[,1:4]
                             test replications elapsed relative
2 computeDirection_cpp(x, y, lat)          100   0.041    1.000
1     computeDirection(x, y, lat)          100   0.130    3.171

##computeAsymetry
[ins] r$> benchmark(computeAsymmetry("Chen", wind, direction, x, y, vx, vy, vh, r, rmw, lat), 
                    computeAsymmetry_cpp("Chen", wind, x, y, vx, vy, vh, r, rmw, lat))[,1:4]
                                                                      test  replications elapsed relative
2        computeAsymmetry_cpp("Chen", wind, x, y, vx, vy, vh, r, rmw, lat)           100   0.269    1.000
1 computeAsymmetry("Chen", wind, direction, x, y, vx, vy, vh, r, rmw, lat)           100   0.657    2.442

##computeExposure
[ins] r$> benchmark(computeExposure(wind, tempRes, windThreshold), 
                    computeExposure_cpp(wind, tempRes, windThreshold))[,1:4]
                                               test replications elapsed relative
2 computeExposure_cpp(wind, tempRes, windThreshold)          100   0.014    1.000
1     computeExposure(wind, tempRes, windThreshold)          100   0.265   18.929

##computePDI
[ins] r$> rbenchmark::benchmark(computePDI(wind, tempRes), 
                    computePDI_cpp(wind, tempRes))[,1:4]
                           test replications elapsed relative
2 computePDI_cpp(wind, tempRes)          100   0.013    1.000
1     computePDI(wind, tempRes)          100   0.097    7.462

I also added some RcppArmadillo test implementation, but it is systematicaly worth than regular Rcpp, even with multiple if conditions inside loops.

This is a first implementation, I did some tests, but this surely can be improved.

rcpp_tests.R file has all the steps to reproduce the benchmarks.

@BaptisteDlp branch issue-optim-2 freezes my terminal. Do I have to do anything special with installation ? Also, it would be better to use environment variable that we can set before the computation step to determine how many cores we use.

umr-amap / StormR