r-spatial / stars

Spatiotemporal Arrays, Raster and Vector Data Cubes
https://r-spatial.github.io/stars/
Apache License 2.0
553 stars 94 forks source link

Dimensions error when filtering #700

Open dominicroye opened 1 month ago

dominicroye commented 1 month ago

I get a dimension error when I want to filter a single pressure level from my ncdf. I can filter the time dimension without any issues. Data example can be downloaded here.

> hu <- read_stars("hur_day_EC-Earth3_historical_r1i1p1f1_gr_19500101-19501231.nc", proxy = F)
> hu850 <-  filter(hu, time <= ymd("1950-06-01"))
> hu850 <- filter(hu850, plev == units::set_units(850*100, Pa))
Error in `glubort()`:
! Measure `/home/dominic/tankdatafic/03_CMIP6/EC-EARTH3/HUR/Historical/hur_day_EC-Earth3_historical_r1i1p1f1_gr_19500101-19501231.nc` needs dimensions [512 x 256 x 7 x 304], not [512 x 256 x 8 x 304]
Run `rlang::last_trace()` to see where the error occurred.
> rlang::last_trace()
<error/rlang_error>
Error in `glubort()`:
! Measure `/home/dominic/tankdatafic/03_CMIP6/EC-EARTH3/HUR/Historical/hur_day_EC-Earth3_historical_r1i1p1f1_gr_19500101-19501231.nc` needs dimensions [512 x 256 x 7 x 304], not [512 x 256 x 8 x 304]
---
Backtrace:
    ▆
 1. ├─dplyr::filter(hu850, plev == units::set_units(850 * 100, Pa))
 2. └─stars:::filter.stars(...)
 3.   ├─cubelyr::as.tbl_cube(.data)
 4.   └─stars::as.tbl_cube.stars(.data)
 5.     └─cubelyr::tbl_cube(dims, c(unclass(x)))
 6.       └─cubelyr:::bad_measures(...)
 7.         └─cubelyr:::glubort(fmt_measures(measures), ..., .envir = .envir)
Run rlang::last_trace(drop = FALSE) to see 1 hidden frame.
> sessionInfo()
R version 4.4.1 (2024-06-14)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 22.04.4 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0 
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: Etc/UTC
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] ncmeta_0.4.0    furrr_0.3.1     future_1.33.2   tictoc_1.2.1    stars_0.6-7     abind_1.4-5     lubridate_1.9.3 forcats_1.0.0   stringr_1.5.1   dplyr_1.1.4     purrr_1.0.2    
[12] readr_2.1.5     tidyr_1.3.1     tibble_3.2.1    ggplot2_3.5.1   tidyverse_2.0.0 fs_1.6.4        sf_1.0-16       terra_1.7-78   

loaded via a namespace (and not attached):
 [1] utf8_1.2.4         generics_0.1.3     class_7.3-22       KernSmooth_2.23-24 stringi_1.8.4      listenv_0.9.1      digest_0.6.36      hms_1.1.3          magrittr_2.0.3    
[10] grid_4.4.1         timechange_0.3.0   e1071_1.7-14       DBI_1.2.3          fansi_1.0.6        scales_1.3.0       codetools_0.2-20   RNetCDF_2.9-2      cli_3.6.3         
[19] crayon_1.5.3       rlang_1.1.4        units_0.8-5        parallelly_1.37.1  munsell_0.5.1      withr_3.0.0        tools_4.4.1        parallel_4.4.1     tzdb_0.4.0        
[28] colorspace_2.1-0   globals_0.16.3     vctrs_0.6.5        R6_2.5.1           proxy_0.4-27       lifecycle_1.0.4    classInt_0.4-10    pkgconfig_2.0.3    pillar_1.9.0      
[37] gtable_0.3.5       glue_1.7.0         Rcpp_1.0.13        tidyselect_1.2.1   rstudioapi_0.16.0  compiler_4.4.1
kadyb commented 1 month ago

It seems that applying a filter with only single pressure level works, but when combined with filtering by date, we get this error. When I swapped the order of the filters, I got this error:

Error in make_intervals(x$start[i], x$end[i]) : 
  length(start) > 0 is not TRUE

If you are looking for an alternative approach, filtering with square brackets can be applied this way:

plev = st_get_dimension_values(hu, "plev")
time = st_get_dimension_values(hu, "time")

plev_idx = which(plev == units::set_units(850*100, Pa))
time_idx = which(time <= as.POSIXct("1950-06-01 12:00:00", tz = "UTC"))

hu[,,, plev_idx, time_idx]
# stars object with 4 dimensions and 1 attribute
# attribute(s), summary of first 1e+05 cells:
#                             Min.  1st Qu.   Median     Mean  3rd Qu.    Max.
# Relative Humidity [%] 0.03111257 48.86616 70.07285 63.57896 81.61889 124.086
# dimension(s):
#      from  to                  offset   delta  refsys             values x/y
# x       1 512                 -0.3516  0.7031      NA               NULL [x]
# y       1 256                   89.81 -0.7017      NA               NULL [y]
# plev    2   2                      NA      NA udunits [85000,70000) [Pa]
# time    1 152 1950-01-01 12:00:00 UTC  1 days POSIXct               NULL
dominicroye commented 1 month ago

Yes, this solution is working. Thanks!!

dominicroye commented 1 month ago

I just noticed that it works only without using stars_proxy.

hu <- read_stars("hur_day_EC-Earth3_historical_r1i1p1f1_gr_19500101-19501231.nc", proxy = T)

plev = st_get_dimension_values(hu, "plev")
time = st_get_dimension_values(hu, "time")

plev_idx = which(plev == units::set_units(850*100, Pa))
time_idx = which(time <= as.POSIXct("1950-06-01 12:00:00", tz = "UTC"))

hu <- hu[,,, plev_idx, time_idx, drop = T]
> write_mdim(hu, "test.nc")
Error in names(dim(x[[i]])) <- names(d) : 
  attempt to set an attribute on NULL
edzer commented 2 weeks ago

This seems to now work:

library(stars)
hu <- read_mdim("hur_day_EC-Earth3_historical_r1i1p1f1_gr_19500101-19501231.nc", proxy = T)

plev = st_get_dimension_values(hu, "plev")
time = st_get_dimension_values(hu, "time")

plev_idx = which(plev == units::set_units(850*100, Pa))
time_idx = which(time <= as.Date("1950-06-01"))

hu <- hu[,,, plev_idx, time_idx, drop = T]
write_mdim(hu, "/tmp/test.nc")
dominicroye commented 2 weeks ago

Yes, it worked. Thanks. But as far as I see, it doesn't work with read_stars()? The thing is that read_mdim() doesn't read multiple files as time series.

edzer commented 2 weeks ago

The thing is that read_mdim() doesn't read multiple files as time series.

Can you give an example where that doesn't work?

dominicroye commented 2 weeks ago

The example you pasted here first (I saw it only in my email) is what I searched for, but as you say, I would need it as a proxy. I tried out y = do.call(c, lapply(file_list, read_mdim, proxy = T)) and it seems to work.

library(stars)                                                                                                    
x = c(                                                                       
"[avhrr-only-v2.19810901.nc](http://avhrr-only-v2.19810901.nc/)",                                                                                 
"[avhrr-only-v2.19810902.nc](http://avhrr-only-v2.19810902.nc/)",                                                 
"[avhrr-only-v2.19810903.nc](http://avhrr-only-v2.19810903.nc/)",                                                              
"[avhrr-only-v2.19810904.nc](http://avhrr-only-v2.19810904.nc/)",                                                 
"[avhrr-only-v2.19810905.nc](http://avhrr-only-v2.19810905.nc/)",                                                                                      
"[avhrr-only-v2.19810906.nc](http://avhrr-only-v2.19810906.nc/)",                                                 
"[avhrr-only-v2.19810907.nc](http://avhrr-only-v2.19810907.nc/)",                                                        
"[avhrr-only-v2.19810908.nc](http://avhrr-only-v2.19810908.nc/)",                                                 
"[avhrr-only-v2.19810909.nc](http://avhrr-only-v2.19810909.nc/)"                                                                              
)                                                                            
# see the second vignette:                                                                                       
# install.packages("starsdata", repos = "http://pebesma.staff.ifgi.de/", type = "source")                 
file_list = system.file(paste0("netcdf/", x), package = "starsdata")         
# (y = read_stars(file_list, quiet = TRUE))                                                                
(y = do.call(c, lapply(file_list, read_mdim)))

Another issue I found now is in st_crop(). As far as I see, it is related to colrow_from_xy() and the step obj[[xy[1]]]$values, which in turn has null values, although lon values ​​are available when I use st_get_dimension_values(hu, "lon").

library(stars)
hu <- read_mdim("hur_day_EC-Earth3_historical_r1i1p1f1_gr_19500101-19501231.nc", proxy = T)

plev = st_get_dimension_values(hu, "plev")
time = st_get_dimension_values(hu, "time")

plev_idx = which(plev == units::set_units(850*100, Pa))
time_idx = which(time <= as.Date("1950-06-01"))

hu <- hu[,,, plev_idx, time_idx, drop = T] # it seems not to drop the plev dimension?!

hu <- st_set_dimensions(hu, "lon", values = st_get_dimension_values(hu, "lon")-180)
# stars_proxy object with 1 attribute in 1 file(s):
# $hur
# [1] "[...]/hur_day_EC-Earth3_historical_r1i1p1f1_gr_19500101-19501231.nc"
#
# dimension(s):
#     from  to     offset  delta  refsys                            values x/y
# lon     1 512       -180 0.7031      NA                              NULL [x]
# lat     1 256         NA     NA      NA [-90,-89.11489),...,[89.11489,90) [y]
# plev    2   2         NA     NA udunits                [85000,70000) [Pa]    
# time    1 151 1950-01-01 1 days    Date                              NULL    

st_crop(hu, st_bbox(c(xmin  = -20, ymin = 25, xmax = 20, ymax = 50)))
# Error in as_intervals(ix, add_last = length(ix) == dim(obj)[xy[1]]) : 
#  is.atomic(x) is not TRUE

Thank you for your help.

edzer commented 2 weeks ago

Yes, but read_mdim(file_list) also works, even with proxy=TRUE. I'll look into you other issue.

edzer commented 2 weeks ago
hu <- st_set_dimensions(hu, "lon", values = st_get_dimension_values(hu, "lon")-180)

now generates an error, as dimensions are being reread and overwritten in st_as_stars(). The crop box would then need to be moved to be within 0...360. It breaks because the lat variabe is of type intervals (i.e., a rectilinear grid, where latitude values are irregular). If you would read this file using read_stars (i.e., using the "classic" GDAL interface) then it would reorder the latitude values to be equally spaced, with some warnings:

> hu <- read_stars("hur_day_EC-Earth3_historical_r1i1p1f1_gr_19500101-19501231.nc", proxy = T)
Warning messages:
1: In CPL_get_metadata(file, NA_character_, options) :
  GDAL Message 1: Latitude grid not spaced evenly.  Setting projection for grid spacing is within 0.1 degrees threshold.
2: In CPL_read_gdal(as.character(x), as.character(options), as.character(driver),  :
  GDAL Message 1: Latitude grid not spaced evenly.  Setting projection for grid spacing is within 0.1 degrees threshold.
> st_crop(hu, st_bbox(c(xmin  = 20, ymin = 25, xmax = 40, ymax = 50))) |> st_as_stars() 
stars object with 4 dimensions and 1 attribute
attribute(s), summary of first 1e+05 cells:
                             Min.   1st Qu.   Median     Mean  3rd Qu.     Max.
Relative Humidity [%] 0.002519234 0.7560223 19.43109 31.13065 58.62669 114.6512
dimension(s):
     from  to                  offset   delta  refsys
x      29  58                 -0.3516  0.7031      NA
y      57  93                   89.81 -0.7017      NA
plev    1   8                      NA      NA udunits
time    1 365 1950-01-01 12:00:00 UTC  1 days POSIXct
                                       values x/y
x                                        NULL [x]
y                                        NULL [y]
plev [1e+05,85000) [Pa],...,[1000,-3000) [Pa]    
time                                     NULL    
Warning messages:
1: In CPL_get_metadata(file, NA_character_, options) :
  GDAL Message 1: Latitude grid not spaced evenly.  Setting projection for grid spacing is within 0.1 degrees threshold.
2: In CPL_read_gdal(as.character(x), as.character(options), as.character(driver),  :
  GDAL Message 1: Latitude grid not spaced evenly.  Setting projection for grid spacing is within 0.1 degrees threshold.
dominicroye commented 2 weeks ago

In this case, which is the best way to change from longitude 360 to 180?

I still get the following error when I include the rest of the steps even with read_stars(), but I also have issues with read_mdim().

With read_stars(): Shouldn't be dropped dimension with only one level?

hu <- read_stars("hur_day_EC-Earth3_historical_r1i1p1f1_gr_19500101-19501231.nc", proxy = T)

plev = st_get_dimension_values(hu, "plev")
time = st_get_dimension_values(hu, "time")

plev_idx = which(plev == units::set_units(850*100, Pa))
time_idx = which(time <= as.Date("1950-06-01"))

hu_sub <- hu[,,, plev_idx, time_idx, drop = T]

st_crop(hu_sub, st_bbox(c(xmin  = -20+180, ymin = 25, xmax = 20+180, ymax = 50))) |> st_as_stars() 
# Error in dim(ret[[i]]) <- dim(new_dim) : 
# dims [product 318459] do not match the length of object [6158280]

st_crop(hu, st_bbox(c(xmin  = -20+180, ymin = 25, xmax = 20+180, ymax = 50))) |> st_as_stars() 
# stars object with 4 dimensions and 1 attribute
# attribute(s), summary of first 1e+05 cells:
#                              Min.   1st Qu.   Median     Mean  3rd Qu.    Max.
# Relative Humidity [%] -0.03696942 0.3956111 22.86977 33.64571 65.41913 112.589
# dimension(s):
#      from  to                  offset   delta  refsys                                   values x/y
# x     229 285                 -0.3516  0.7031      NA                                     NULL [x]
# y      57  93                   89.81 -0.7017      NA                                     NULL [y]
# plev    1   8                      NA      NA udunits [1e+05,85000) [Pa],...,[1000,-3000) [Pa]    
# time    1 365 1950-01-01 12:00:00 UTC  1 days POSIXct                                     NULL    
# Warning messages:
# 1: In CPL_get_metadata(file, NA_character_, options) :
#   GDAL Message 1: Latitude grid not spaced evenly.  Setting projection for grid spacing is within 0.1 degrees threshold.
# 2: In CPL_read_gdal(as.character(x), as.character(options), as.character(driver),  :
#   GDAL Message 1: Latitude grid not spaced evenly.  Setting projection for grid spacing is within 0.1 degrees threshold.

read_mdim() using 360º lon, both errors are the same with or without the steps

hu <- read_mdim("hur_day_EC-Earth3_historical_r1i1p1f1_gr_19500101-19501231.nc", proxy = T)

plev = st_get_dimension_values(hu, "plev")
time = st_get_dimension_values(hu, "time")

plev_idx = which(plev == units::set_units(850*100, Pa))
time_idx = which(time <= as.Date("1950-06-01"))

hu_sub <- hu[,,, plev_idx, time_idx, drop = T] 

hu_sub
# stars_proxy object with 1 attribute in 1 file(s):
# $hur
# [1] "[...]/hur_day_EC-Earth3_historical_r1i1p1f1_gr_19500101-19501231.nc"
# 
# dimension(s):
#     from  to     offset  delta  refsys                            values x/y
# lon     1 512    -0.3516 0.7031      NA                              NULL [x]
# lat     1 256         NA     NA      NA [-90,-89.11489),...,[89.11489,90) [y]
# plev    2   2         NA     NA udunits                [85000,70000) [Pa]    
# time    1 151 1950-01-01 1 days    Date                              NULL    

st_crop(hu_sub, st_bbox(c(xmin  = -20+180, ymin = 25, xmax = 20+180, ymax = 50))) |> st_as_stars() 
# Error in as_intervals(ix, add_last = length(ix) == dim(obj)[xy[1]]) : 
#  is.atomic(x) is not TRUE
st_crop(hu, st_bbox(c(xmin  = -20+180, ymin = 25, xmax = 20+180, ymax = 50))) |> st_as_stars() 
# Error in as_intervals(ix, add_last = length(ix) == dim(obj)[xy[1]]) : 
#  is.atomic(x) is not TRUE
edzer commented 2 weeks ago

With read_stars(): Shouldn't be dropped dimension with only one level?

Arguably no, because you loose information (the value of that single level). You can put an adrop() in the pipeline to have it dropped (probably after reading in memory).

read_mdim() using 360º lon, both errors are the same with or without the steps

Yes, I'll look into either fixing this or making the error message more helpful.

In this case, which is the best way to change from longitude 360 to 180?

Tell the data producers to do so?

edzer commented 2 weeks ago

More seriously,

In this case, which is the best way to change from longitude 360 to 180?

I managed to do so outside R, with cdo,

cdo -sellonlatbox,-180,180,-90,90 hur_day_EC-Earth3_historical_r1i1p1f1_gr_19500101-19501231.nc out.nc

then

r = read_mdim("out.nc")
plot(adrop(r[,,,1,1]),axes = TRUE)

gives x

But to be honest, it took me quite some time to get cdo installed with NetCDF-4 support; I used

./configure --enable-netcdf4 --enable-zlib --with-netcdf=/usr/ --with-hdf5=/usr/hdf5/
make
sudo make install
dominicroye commented 2 weeks ago

Ok. Thanks. Terra has the rotate() function for this kind of operation. Regarding telling the data producers to do so, climate projections are run with 360º by default as standard.

edzer commented 2 weeks ago

climate projections are run with 360º by default as standard.

Yes, I discussed this with @Nowosad over lunch, and given that these are all discrete global grids, this whole shifting or rotating should not be needed; the reason it's needed is that much of our software stacks still tend to treat geographic coordinates as Cartesian coordinates. I think that needs to change. (Terra will also not preserve the irregular latitude values.)

dominicroye commented 2 weeks ago

I found another possibility, but as proxy_stars, it doesn't work. For me, the grid is regular, but the only difference is in longitude 360º.

hu <- read_stars("hur_day_EC-Earth3_historical_r1i1p1f1_gr_19500101-19501231.nc", proxy = T)

plev = st_get_dimension_values(hu, "plev")
time = st_get_dimension_values(hu, "time")

plev_idx = which(plev == units::set_units(850*100, Pa))
time_idx = which(time <= as.Date("1950-06-01"))

hu_sub <- hu[,,, plev_idx, time_idx, drop = T]
st_crs(hu_sub) <- "+proj=longlat +datum=WGS84 +pm=360dw"

# working
st_crs(hu_sub) <- "+proj=longlat +datum=WGS84 +pm=360dw"
st_as_stars(hu_sub) %>% st_transform(hu_sub, 4326)

# not working
st_transform(hu_sub, 4326)
# Error: Not compatible with requested type: [type=list; target=double].

I thought that I could use st_crop()then as usally, but ...

st_as_stars(hu_sub) %>% st_transform(4326) %>%
   st_crop(st_bbox(c(xmin  = -20, ymin = 25, xmax = 20, ymax = 50), crs = 4326))
# stars object with 3 dimensions and 1 attribute
# attribute(s), summary of first 1e+05 cells:
#               Min.  1st Qu.   Median     Mean  3rd Qu.     Max.
# hur [%] 0.03111257 55.60625 74.12261 67.01967 83.88524 111.4188
# dimension(s):
#      from  to     offset  delta refsys                     values x/y
# lon     1 512         NA     NA WGS 84   [512x256] -180,...,179.3 [x]
# lat     1 256         NA     NA WGS 84 [512x256] -89.56,...,89.56 [y]
# time    1 151 1950-01-01 1 days   Date                       NULL    
# curvilinear grid
# Warning messages:
# 1: In st_crop.stars(., st_bbox(c(xmin = -20, ymin = 25, xmax = 20,  :
 #  st_crop: bounding boxes of x and y do not overlap
# 2: In st_crop.stars(., st_bbox(c(xmin = -20, ymin = 25, xmax = 20,  :
#  crop only crops regular grids: maybe use st_warp() first?