isciences / exactextractr

R package for fast and accurate raster zonal statistics
https://isciences.gitlab.io/exactextractr/
274 stars 26 forks source link

NaN values in extraction #21

Open victofs opened 4 years ago

victofs commented 4 years ago

I am trying to use exact_extract function where the shapefile is the one used in the Basic Usage section of the readme file

# Pull municipal boundaries for Brazil
brazil <- st_as_sf(getData('GADM', country='BRA', level=2))

I have the following RasterLayer:

my_raster   <- raster::raster(ncfname)
str(my_raster)
Formal class 'RasterLayer' [package "raster"] with 12 slots
  ..@ file    :Formal class '.RasterFile' [package "raster"] with 13 slots
  .. .. ..@ name        : chr "xxxx"| __truncated__
  .. .. ..@ datanotation: chr "FLT4S"
  .. .. ..@ byteorder   : chr "little"
  .. .. ..@ nodatavalue : num -32767
  .. .. ..@ NAchanged   : logi FALSE
  .. .. ..@ nbands      : int 6552
  .. .. ..@ bandorder   : chr "BIL"
  .. .. ..@ offset      : int 0
  .. .. ..@ toptobottom : logi FALSE
  .. .. ..@ blockrows   : int 0
  .. .. ..@ blockcols   : int 0
  .. .. ..@ driver      : chr "netcdf"
  .. .. ..@ open        : logi FALSE
  ..@ data    :Formal class '.SingleLayerData' [package "raster"] with 13 slots
  .. .. ..@ values    : logi(0) 
  .. .. ..@ offset    : num 0
  .. .. ..@ gain      : num 1
  .. .. ..@ inmemory  : logi FALSE
  .. .. ..@ fromdisk  : logi TRUE
  .. .. ..@ isfactor  : logi FALSE
  .. .. ..@ attributes: list()
  .. .. ..@ haveminmax: logi FALSE
  .. .. ..@ min       : num Inf
  .. .. ..@ max       : num -Inf
  .. .. ..@ band      : int 112
  .. .. ..@ unit      : chr "m**3 m**-3"
  .. .. ..@ names     : chr "xxxx"
  ..@ legend  :Formal class '.RasterLegend' [package "raster"] with 5 slots
  .. .. ..@ type      : chr(0) 
  .. .. ..@ values    : logi(0) 
  .. .. ..@ color     : logi(0) 
  .. .. ..@ names     : logi(0) 
  .. .. ..@ colortable: logi(0) 
  ..@ title   : chr(0) 
  ..@ extent  :Formal class 'Extent' [package "raster"] with 4 slots
  .. .. ..@ xmin: num -55
  .. .. ..@ xmax: num -45
  .. .. ..@ ymin: num -17.1
  .. .. ..@ ymax: num -12.9
  ..@ rotated : logi FALSE
  ..@ rotation:Formal class '.Rotation' [package "raster"] with 2 slots
  .. .. ..@ geotrans: num(0) 
  .. .. ..@ transfun:function ()  
  ..@ ncols   : int 101
  ..@ nrows   : int 41
  ..@ crs     :Formal class 'CRS' [package "sp"] with 1 slot
  .. .. ..@ projargs: chr "+proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0"
  ..@ history : list()
  ..@ z       :List of 1
  .. ..$ : chr "2019-01-05 16:06:28"

Then i run the exact_extract function

brazil$mean <- (brazil, my_raster, 'mean')

which returns me NaN values, as shown below

image

Am i doing something wrong importing the raster or this is a known issue in the package?

dbaston commented 4 years ago

Are all of the results NaN, or just some of them? If a polygon doesn't intersect any defined cell in your raster, the mean will be 0/0 = NaN.

aldotapia commented 2 years ago

I can provide some feedback (for this closed and old issue) since I experimented the same today with exactextractr_0.9.0 (same with exactextractr_0.7.2 and exactextractr_0.8.2) installed from conda (r-exactextract). Trying to extract some values from a GLDAS product, I got the following values:

> v <- read_sf('./some/shape.shp')
> r <- rast('./some/raster.tif')
> exact_extract(x = r,
               y = v[8,],
               include_cols = "gauge_id",
               progress = F)
[[1]]
  gauge_id value coverage_fraction
1  1041002     0        0.13134970
2  1041002   NaN        0.45708704
3  1041002   NaN        0.11274476
4  1041002   NaN        0.03988117
5  1041002   NaN        0.13633704
6  1041002   NaN        0.02293241

While in other R installation (outside conda environment with exactextractr_0.9.0) for the same task I got:

[[1]]
  gauge_id   value coverage_fraction
1  1041002 0.08750        0.13134970
2  1041002 0.13000        0.45708704
3  1041002 0.03750        0.11274476
4  1041002 0.04250        0.03988117
5  1041002 0.07000        0.13633704
6  1041002 0.00875        0.02293241

Session info where the task fails:

R version 4.1.3 (2022-03-10)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: macOS Big Sur/Monterey 10.16

Matrix products: default
BLAS/LAPACK: /Users/aldotapia/miniforge3/envs/hidrocl/lib/libopenblasp-r0.3.20.dylib

locale:
[1] C/UTF-8/C/C/C/C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] terra_1.5-21        sf_1.0-7            exactextractr_0.9.0

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.9         raster_3.5-21      magrittr_2.0.3     units_0.8-0       
 [5] lattice_0.20-45    rlang_1.0.4        fansi_1.0.3        tools_4.1.3       
 [9] grid_4.1.3         KernSmooth_2.23-20 utf8_1.2.2         cli_3.3.0         
[13] e1071_1.7-11       DBI_1.1.3          class_7.3-20       tibble_3.1.8      
[17] lifecycle_1.0.1    vctrs_0.4.1        codetools_0.2-18   glue_1.6.2        
[21] sp_1.5-0           proxy_0.4-27       compiler_4.1.3     pillar_1.8.1      
[25] classInt_0.4-7     pkgconfig_2.0.3   

Session info where the task runs ok:

R version 4.2.0 (2022-04-22)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Monterey 12.5

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] exactextractr_0.9.0 terra_1.5-42        sf_1.0-7           

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.9         rstudioapi_0.13    raster_3.5-21      magrittr_2.0.3     units_0.8-0       
 [6] tidyselect_1.1.2   lattice_0.20-45    R6_2.5.1           rlang_1.0.4        fansi_1.0.3       
[11] dplyr_1.0.9        tools_4.2.0        grid_4.2.0         utf8_1.2.2         KernSmooth_2.23-20
[16] cli_3.3.0          e1071_1.7-11       DBI_1.1.3          ellipsis_0.3.2     class_7.3-20      
[21] assertthat_0.2.1   tibble_3.1.7       lifecycle_1.0.1    crayon_1.5.1       purrr_0.3.4       
[26] vctrs_0.4.1        codetools_0.2-18   glue_1.6.2         sp_1.5-0           proxy_0.4-27      
[31] compiler_4.2.0     pillar_1.7.0       generics_0.1.3     classInt_0.4-7     pkgconfig_2.0.3

I tried to update R to 4.2.0 or 4.2.1 in my conda env without success, so far I'm using a scale factor as workaround and works alright for my needs:

exact_extract(x = r*100,
               y = v[8,],
               include_cols = "gauge_id",
               progress = F)
[[1]] 
  gauge_id  value coverage_fraction
1  1041002  8.750        0.13134970
2  1041002 13.000        0.45708704
3  1041002  3.750        0.11274476
4  1041002  4.250        0.03988117
5  1041002  7.000        0.13633704
6  1041002  0.875        0.02293241
aldotapia commented 2 years ago

A bit more on context:

The scale factor inside the R function wasn't enough. I'm processing files in Python and exporting them to R for using the exact_extract computing weighted stats and the percent of valid pixels in the analysis.

So I applied a scale factor while doing the process in Python and saved the file in uint8 data type, everything works great so far. Now the output takes all the valid pixels when using floating point rasters some zones took 0 pixels (NaN output) or 30 - 50%, which is wrong knowing the data.

Any hints or ideas of this behavior?

dbaston commented 2 years ago

Thanks for documenting this. Is my hunch right that any of the following modifications (applied by itself) would "fix" this?

aldotapia commented 2 years ago

Yes, you're right. I suppose it's related with raster's statistics, r*1 made the trick.

I tested it with 6 dates and +400 polygons, everything looks fine so far (with r and with r*1):

output

I need to run it on +7000 dates, so if I face another issue related with this function, I'm gonna provide more feedback.

Thank you!

dbaston commented 2 years ago

r*1 is just a trick to force terra to load everything into memory. It appears that either there is a problem with the way I'm requesting cell values from an out-of-memory raster with terra, or there's a problem within terra itself. Having a reproducible case would help figure out a solution rather than a workaround.

aldotapia commented 2 years ago

Sure, you can access to data in this drive folder

I also saved the result and sessionInfo in workspace.RData. Also, this example is the extraction of one of the +400 vectors used. Some of them have the same issue.

The script used is:

setwd('test_path')

v <- 'HidroCL_boundaries.shp'
r <- 'snow_gldas_A20000102.tif'

result <- exactextractr::exact_extract(x = terra::rast(r),
                                       y = sf::read_sf(v)[8,],
                                       include_cols = "gauge_id",
                                       progress = F)

 session_info <- sessionInfo()
 save.image('~/hidrocl_test/workspace.RData')