rspatial / terra

R package for spatial data handling https://rspatial.github.io/terra/reference/terra-package.html
GNU General Public License v3.0
533 stars 88 forks source link

terra takes > 1 sec to load on linux and probably >5 sec on Windows and Mac #1440

Open Jean-Romain opened 5 months ago

Jean-Romain commented 5 months ago

Loading terra either using library() or by namespace using terra:: takes more than a second on my machine (linux). Something in between 1.3 and 1.7 seconds. This is huge! And I guess it is much more on Windows and Mac probably close to 5 seconds.

It is very problematic for codes that actually take milliseconds to run. The first run may take 1.5 secs while the second may take 150 ms. On my side the main problem is that the examples of my package documentation, that are supposed to take something like 100 ms actually take 1.5 seconds on first run. This is ok for R CMD check on linux but R CMD check on Windows and Mac is failing because the examples are taking more than 5 seconds. All because I'm reading a small raster with terra.

And other issue is that it is absolutely impossible to debug a c++ code with valgrind if somehow a terra function is involved to make a reproducible example because with valgrind this takes several minutes.

In a fresh session

t0 = Sys.time() ; library(terra) ; Sys.time()- t0
#> terra 1.7.71
#> Time difference of 1.510221 secs
t0 = Sys.time() ; r = terra::rast() ; Sys.time()- t0
#> Time difference of 1.534728 secs

As a comparison dplyr takes 0.004 sec to load , Rcpp 0.002 sec, ggplot 0.03 sec and sf takes 0.3 sec (which is huge)

dimfalk commented 3 months ago

@Jean-Romain Hm, can't claim loading {terra} takes this long on Windows by default, at leased based on median time (although it seems like terra can be an outlier with approx. 3 sec max):

mbm <- microbenchmark::microbenchmark(library(dplyr),
                                      library(Rcpp),
                                      library(ggplot2),
                                      library(sf),
                                      library(terra),

                                      times = 1000)

mbm
#> Unit: microseconds
#>              expr   min    lq      mean median    uq       max neval
#>    library(dplyr) 103.6 105.9  297.9007  108.1 110.4  177550.8  1000
#>     library(Rcpp) 103.5 106.2  120.3530  108.2 110.9    5399.9  1000
#>  library(ggplot2) 103.3 106.4  196.4520  108.3 110.5   81636.3  1000
#>       library(sf) 103.4 106.0  490.5340  108.1 110.0  361355.4  1000
#>    library(terra) 103.8 106.3 3115.3979  108.1 110.2 3002136.3  1000

ggplot2::autoplot(mbm)

sessionInfo()
#> R version 4.3.3 (2024-02-29 ucrt)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19045)
#> 
#> Matrix products: default
#> 
#> 
#> locale:
#> [1] LC_COLLATE=German_Germany.utf8  LC_CTYPE=German_Germany.utf8   
#> [3] LC_MONETARY=German_Germany.utf8 LC_NUMERIC=C                   
#> [5] LC_TIME=German_Germany.utf8    
#> 
#> time zone: Europe/Berlin
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] ggplot2_3.5.0 terra_1.7-71  Rcpp_1.0.12   dplyr_1.1.4   sf_1.0-16    
#> 
#> loaded via a namespace (and not attached):
#>  [1] gtable_0.3.4          compiler_4.3.3        tidyselect_1.2.1     
#>  [4] reprex_2.1.0          scales_1.3.0          yaml_2.3.8           
#>  [7] fastmap_1.1.1         R6_2.5.1              generics_0.1.3       
#> [10] microbenchmark_1.4.10 classInt_0.4-10       knitr_1.45           
#> [13] tibble_3.2.1          units_0.8-5           munsell_0.5.0        
#> [16] R.cache_0.16.0        DBI_1.2.2             pillar_1.9.0         
#> [19] R.utils_2.12.3        rlang_1.1.3           utf8_1.2.4           
#> [22] xfun_0.43             fs_1.6.3              cli_3.6.2            
#> [25] withr_3.0.0           magrittr_2.0.3        class_7.3-22         
#> [28] digest_0.6.35         grid_4.3.3            rstudioapi_0.16.0    
#> [31] lifecycle_1.0.4       R.methodsS3_1.8.2     R.oo_1.26.0          
#> [34] vctrs_0.6.5           KernSmooth_2.23-22    proxy_0.4-27         
#> [37] evaluate_0.23         glue_1.7.0            farver_2.1.1         
#> [40] styler_1.10.2         codetools_0.2-20      colorspace_2.1-0     
#> [43] fansi_1.0.6           e1071_1.7-14          rmarkdown_2.26       
#> [46] purrr_1.0.2           pkgconfig_2.0.3       tools_4.3.3          
#> [49] htmltools_0.5.8
Jean-Romain commented 3 months ago

@dimfalk my test was on linux. I re-ran for dplyr, ggplot and co and I probably made a mistake in my first messsage. The timing is closer to 0.1 sec than 0.001 sec. I probaly made the same error than you, loading the libs one after this other. But if you load ggplot the timing for dplyr becomes 0.0001 sec. Each lib must be benchmarked only once in a fresh session.

Anyway, you have the same issue. You can't microbenchmark this 1000 times. Only the first run is slow. Then the libs are already loaded and next repetitions are almost instantaneous. Only the first run matters, which is likely the "max" one. Like me you have a ten fold difference.

dimfalk commented 3 months ago

@Jean-Romain Oopsie, newbie mistake - my bad! :smirk:

At least it explains why the distributions are similar to this extent...

I benchmarked the following libs manually a few times, using a fresh session:

# terra (in sec): 
# c(3.5, 3.39, 3.72, 3.56, 3.45, 3.59, 3.47, 3.46, 3.51, 3.42)

# sf (in sec):
# c(0.38, 0.54, 0.48, 0.46, 0.46, 0.48, 0.46, 0.51, 0.50, 0.47)