ropensci / chopin

Computation of Spatial Data by Hierarchical and Objective Partitioning of Inputs for Parallel Processing
https://niehs.github.io/chopin/
Other
12 stars 2 forks source link

Extract function: question about touch parameter #105

Closed Rapsodia86 closed 4 days ago

Rapsodia86 commented 3 weeks ago

Hello! I do have a question about the extract function. In the package, it is the extractextract (https://isciences.gitlab.io/exactextractr/reference/exact_extract.html) that is used. I checked the methods but saw no parameter defining the pixel extraction strategy (I see cell fraction but that is not it). E.g. in the terra package, you can define touches TRUE/FALSE. "If TRUE, values for all cells touched by lines or polygons are extracted, not just those on the line render path, or whose center point is within the polygon. Not relevant for points; and always considered TRUE when weights=TRUE or exact=TRUE." (https://rdrr.io/cran/terra/man/extract.html). Any plans to add terra::extract as an alternative for the extraction which would allow for different pixel "selection" along the borders? Unless it is implemented and I just overlooked it?

Thanks, Monika

sigmafelix commented 2 weeks ago

@Rapsodia86 Hi, thank you for the question. I think your question is relevant to extract_at(). At the function design phase, we considered exactextractr::exact_extract() for the only raster-vector overlay since we preferred that all cells along the line render path are taken into account to calculate the summary statistics and exactextractr::exact_extract() runs faster than terra::extract().

I think your suggestion is a great addition to our development to-do list for the next version of chopin. Please hang tight for the further update.

Best,


Rapsodia86 commented 2 weeks ago

That would be great, so thank you for taking it under consideration🙂

Best, Monika

sigmafelix commented 2 weeks ago

@Rapsodia86 0.9.0 now supports terra mode in extract_at, where terra::extract() is used along with additional arguments including exact, weights, and touches. Could you check if it works okay? It would be great to know if there are any side effects even though I added tests for the update. Thanks!

Rapsodia86 commented 2 weeks ago

I will explore sometime this week and report back! Thank you!:)

Rapsodia86 commented 2 weeks ago

Hi @sigmafelix ! A couple of observations to report back:

1) I just installed chopin, future.mirai, and mirai. And I got an error of missing anticlust and collapse packages when I tried for the first time to run chopin::extract_at. After installing and loading those two packages, it worked. 2) I am running a small tests on the SpatRaster (I changed sources and names while pasting) and polygon SpatVector:

class       : SpatRaster 
dimensions  : 5736, 7591, 2  (nrow, ncol, nlyr)
resolution  : 30, 30  (x, y)
extent      : 5004000, 5231730, 2443560, 2615640  (xmin, xmax, ymin, ymax)
coord. ref. : ETRS89-extended / LAEA Europe (EPSG:3035) 
sources     : EVI2_1.tif  
              EVI2_2.tif  
names       : test_name1, test_name2 

crop_shp_proj
 class       : SpatVector 
 geometry    : polygons 
 dimensions  : 2, 2  (geometries, attributes)
 extent      : 5035756, 5058351, 2559670, 2582382  (xmin, xmax, ymin, ymax)
 coord. ref. : ETRS89-extended / LAEA Europe (EPSG:3035) 
 names       :    Id gridcode
 type        : <int>    <int>
 values      :     1        0
                   2        1

Although it shows only two geometries there is ~400000 pixels to extract from each SpatRaster!

3) I compared terra::extract() with chopin::extract() using touches = TRUE and touches = FALSE, and results are the same 👍

4) I wanted to run par_multirasters(filenames = rstack_filenames,y = crop_shp_proj,id = "ID",func="mean",fun_dist = extract_at)) but that gives me an error (I followed https://docs.ropensci.org/chopin/articles/v01_start.html#multiraster-processing):

ext_res_chopin_par <- par_multirasters(filenames = rstack_filenames,y = crop_shp_proj,id = "ID",func="mean",fun_dist = extract_at)
ℹ Input is not a character.
Input is a character. Attempt to read it with terra::rast...Switch terra class to sf...Input is a character. Attempt to read it with terra::rast...Switch terra class to sf...Error in l[[1L]] : subscript out of bounds

where rstack_filenames is a vector with two full directories. It is exactly the same vector I used to create rstack for the earlier tests. If I have done something incorrectly, please advise.

5) I also tested getting only pixel values with func=NULL, so no data reduction/summary: chopin::extract_at(x = rstack,y = crop_shp_proj,id = "ID",func=NULL,terra=TRUE,exact=FALSE, weights=FALSE, touches=FALSE, xy=FALSE)

  Two things: 
  a) It gave me a warning:
    `"Warning message:
    [extract] cannot return a SpatVector because the number of records extracted does not match the number of rows in y (perhaps you need to use a summarizing function"`

  b) It produced a dataframe without ID info, and added xy. 

   test_name1 test_name2       x       y
    1                  -1161       5763 5055495 2582355
    2                   1981       3747 5055525 2582355
    3                   2605       3118 5055165 2582325
    4                   -650       4867 5055465 2582325
    5                   2278       3737 5055495 2582325
    6                   2308       3630 5055525 2582325

  Therefore, I do not know to which polygon those values belong, and I do not need xy. In `terra:extract`() there are those two parameters: `cells=FALSE, xy=FALSE` that control the output. Also, maybe if adding `out_class = "dataframe"` then, there would be no need for the Warning about it, in case the Warning shows up because at some stage the dataframe cannot be converted to SpatVector/ sf as a final output? Finally. as mentioned, missing ID is a problem. 

UPDATE: I see in the source that xy is TRUE and ID = TRUE (https://github.com/ropensci/chopin/commit/2d3903d4663ee803de4f72e8e24b2076cdeaaafb#diff-076ab714c3842747ba2a8888584905f936faae88959e9a98283e44db17961fd4R231); I added xy=FALSE, so would expect those columns are dropped, but no difference.

  Here is terra::extract() output:

      ID test_name1 test_name2       
    1  1                  -1161       5763
    2  1                   1981       3747
    3  1                   2605       3118
    4  1                   -650       4867
    5  1                   2278       3737
    6  1                   2308       3630

Thanks, Monika

sigmafelix commented 2 weeks ago

@Rapsodia86 Thank you for the update. If you could, please provide a reproducible example (preferably a reprex output with the actual data used in the reprex block) to investigate the issue further.

The first issue comes from the soft-dependency of anticlust and collapse to reduce the number of packages that are installed upon the installation of chopin. For the fourth, I encourage you to set .debug = TRUE in par_multirasters such that error messages are recorded in the resultant data.frame. The fifth issue may be related to kernel weighting functions, and I will look at the issue.

Thank you!

Rapsodia86 commented 1 week ago

Ok, here it is:

library(reprex)
library(chopin)
#> Warning: package 'chopin' was built under R version 4.4.2
library(future)
library(future.mirai)
#> Warning: package 'future.mirai' was built under R version 4.4.2
library(mirai)
#> Warning: package 'mirai' was built under R version 4.4.2
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(terra)
#> terra 1.7.81
future::plan(future.mirai::mirai_multisession, workers = 2L)

rstack <- rast("C:/DataMT/scripts/R/chopin_test/rstack_chopin_test.tif")
rstack
#> class       : SpatRaster 
#> dimensions  : 756, 753, 2  (nrow, ncol, nlyr)
#> resolution  : 30, 30  (x, y)
#> extent      : 5035770, 5058360, 2559690, 2582370  (xmin, xmax, ymin, ymax)
#> coord. ref. : ETRS89-extended / LAEA Europe (EPSG:3035) 
#> source      : rstack_chopin_test.tif 
#> names       : test_name1, test_name2 
#> min values  :      -7149,        372 
#> max values  :       7992,      10326
rstack_filename <- "C:/DataMT/scripts/R/chopin_test/rstack_chopin_test.tif"

crop_shp <- vect("C:/DataMT/scripts/R/chopin_test/shape_chopin_test.shp")
crop_shp
#>  class       : SpatVector 
#>  geometry    : polygons 
#>  dimensions  : 2, 2  (geometries, attributes)
#>  extent      : 5035756, 5058347, 2559679, 2582382  (xmin, xmax, ymin, ymax)
#>  source      : shape_chopin_test.shp
#>  coord. ref. : ETRS89-extended / LAEA Europe (EPSG:3035) 
#>  names       :    Id gridcode
#>  type        : <int>    <int>
#>  values      :     1        0
#>                    2        1
ext_res <-terra::extract(rstack,crop_shp,touches=FALSE)
ext_res_chopin <- chopin::extract_at(x = rstack,y = crop_shp,id = "ID",func=NULL,terra=TRUE,exact=FALSE, weights=FALSE, touches = FALSE,xy=FALSE,ID=TRUE)
#> Switch terra class to sf...
#> Warning: [extract] cannot return a SpatVector because the number of records
#> extracted does not match the number of rows in y (perhaps you need to use a
#> summarizing function
all.equal(ext_res[,2:3], ext_res_chopin[,1:2])
#> [1] TRUE

ext_res_chopin_parmulti <- par_multirasters(filenames = rstack_filename,y = crop_shp,id = "ID",func=NULL,fun_dist = extract_at, terra=TRUE,exact=FALSE, weights=FALSE, touches = FALSE,.debug = TRUE)
#> ℹ Input is not a character.
#> Input is a character. Attempt to read it with terra::rast...
#> Switch terra class to sf...
ext_res_chopin_parmulti
#>                                              base_raster
#> 1 C:/DataMT/scripts/R/chopin_test/rstack_chopin_test.tif
#>                                                                                                                                                                                                                                                                                                                                                                       error_message
#> 1 NULL value passed as symbol address .Call(list(name = "CppField__get", address = <pointer: (nil)>, dll = list(name = "Rcpp", path = "C:/Users/monikat/AppData/Local/R/win-library/4.4/Rcpp/libs/x64/Rcpp.dll", dynamicLookup = TRUE, handle = <pointer: (nil)>, info = <pointer: (nil)>, forceSymbols = FALSE), numParameters = 3), <pointer: (nil)>, <pointer: (nil)>, .pointer)
ext_res_chopin_parmulti_mean <- par_multirasters(filenames = rstack_filename,y = crop_shp,id = "ID",func="mean",fun_dist = extract_at, terra=TRUE,exact=FALSE, weights=FALSE, touches = FALSE,.debug = TRUE)
#> ℹ Input is not a character.
#> Input is a character. Attempt to read it with terra::rast...Switch terra class to sf...
ext_res_chopin_parmulti_mean
#>                                              base_raster
#> 1 C:/DataMT/scripts/R/chopin_test/rstack_chopin_test.tif
#>                                                                                                                                                                                                                                                                                                                                                                       error_message
#> 1 NULL value passed as symbol address .Call(list(name = "CppField__get", address = <pointer: (nil)>, dll = list(name = "Rcpp", path = "C:/Users/monikat/AppData/Local/R/win-library/4.4/Rcpp/libs/x64/Rcpp.dll", dynamicLookup = TRUE, handle = <pointer: (nil)>, info = <pointer: (nil)>, forceSymbols = FALSE), numParameters = 3), <pointer: (nil)>, <pointer: (nil)>, .pointer)

chopin_test.zip

Created on 2024-11-15 with reprex v2.1.0

Thanks! Monika

Session information:

R version 4.4.0 (2024-04-24 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 10 x64 (build 19045)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.utf8 
[2] LC_CTYPE=English_United States.utf8   
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.utf8    

time zone: America/New_York
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods  
[7] base     

other attached packages:
[1] terra_1.7-81       dplyr_1.1.4        mirai_1.3.0       
[4] future.mirai_0.2.2 future_1.33.2      chopin_0.9.0      
[7] reprex_2.1.0      

loaded via a namespace (and not attached):
 [1] Matrix_1.7-0        future.apply_1.11.2 compiler_4.4.0     
 [4] tidyselect_1.2.1    Rcpp_1.0.12         collapse_2.0.17    
 [7] parallel_4.4.0      globals_0.16.3      lattice_0.22-6     
[10] R6_2.5.1            generics_0.1.3      igraph_2.0.3       
[13] classInt_0.4-10     nanonext_1.3.0      sf_1.0-16          
[16] tibble_3.2.1        stars_0.6-5         units_0.8-5        
[19] DBI_1.2.3           pillar_1.9.0        rlang_1.1.3        
[22] utf8_1.2.4          RANN_2.6.1          fs_1.6.4           
[25] cli_3.6.2           withr_3.0.0         magrittr_2.0.3     
[28] class_7.3-22        digest_0.6.35       grid_4.4.0         
[31] rstudioapi_0.16.0   lifecycle_1.0.4     vctrs_0.6.5        
[34] KernSmooth_2.23-22  anticlust_0.8.7     proxy_0.4-27       
[37] glue_1.7.0          listenv_0.9.1       codetools_0.2-20   
[40] abind_1.4-5         parallelly_1.37.1   fansi_1.0.6        
[43] e1071_1.7-14        tools_4.4.0         pkgconfig_2.0.3
sigmafelix commented 4 days ago

@Rapsodia86 Thank you for providing the full data. It is very helpful to figure out. For the extract case only, I suggest using par_multirasters with file paths for both filenames (raster) and y (vector).

library(reprex)
library(chopin)
library(future)
library(future.mirai)
library(mirai)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(terra)
#> terra 1.7.83

future::plan(future.mirai::mirai_multisession, workers = 2L)

rstack_filename <- "~/Documents/chopin_test/rstack_chopin_test.tif"
crop_filename <- "~/Documents/chopin_test/shape_chopin_test.shp"

ext_res_chopin_parmulti <-
  par_multirasters(
    filenames = rstack_filename,
    y = crop_filename,
    func = NULL,
    fun_dist = extract,
    ID = TRUE,
    exact = FALSE,
    weights = FALSE,
    touches = FALSE,
    bind = FALSE,
    .debug = TRUE
  )
#> Input is a character. Attempt to read it with terra::rast...
#> ℹ Input is a character. Trying to read with terra.
#> ℹ Your input function at ~/Documents/chopin_test/rstack_chopin_test.tif is dispatched.
head(ext_res_chopin_parmulti)
#>   ID test_name1 test_name2                                    base_raster
#> 1  1       2236       2297 ~/Documents/chopin_test/rstack_chopin_test.tif
#> 2  1       2358       2202 ~/Documents/chopin_test/rstack_chopin_test.tif
#> 3  1       1747       2170 ~/Documents/chopin_test/rstack_chopin_test.tif
#> 4  1       1290       2457 ~/Documents/chopin_test/rstack_chopin_test.tif
#> 5  1       1005       3311 ~/Documents/chopin_test/rstack_chopin_test.tif
#> 6  1       2286       2282 ~/Documents/chopin_test/rstack_chopin_test.tif

Created on 2024-11-19 with reprex v2.1.1

extract_at in chopin was a shortcut for buffer generation and exactextractr::exact_extract() run. Internal functions like .check_character and .check_vector operated well with the original design of extract_at. However, a recent 0.9.0 update to expand extract_at just complicated the class conversion since terra::extract() requires SpatRaster x and SpatVector y for raster-vector overlay task. Thus, I will remove the terra mode in extract_at and recommend users to run terra raster-vector overlay inside any of par_* functions.

I will be happy to discuss more about this issue or others if you've come across. Thank you!

Rapsodia86 commented 4 days ago

Thanks @sigmafelix for all your interest in my suggestion and your work! Yes, indeed, spatvector also needs a filename🤦 Documentation confused me (https://docs.ropensci.org/chopin/articles/v01_start.html#multiraster-processing) where under the y you had the sf object. I should have tested that one before calling it an issue. I think the case is closed now, and I agree that reducing complications and parameters to control functions within a function is a smart move, both for a maintainer and a user.

Best, Monika