r-spatial / stars

Spatiotemporal Arrays, Raster and Vector Data Cubes
https://r-spatial.github.io/stars/
Apache License 2.0
564 stars 94 forks source link

coordinates with bounds not correctly read for ncdf #175

Closed edzer closed 1 year ago

edzer commented 5 years ago

See here: https://stat.ethz.ch/pipermail/r-sig-geo/2019-May/027288.html

When read with r = read_ncdf(f, ncsub = cbind(start = c(1, 1, 1, 1), count = c(100, 100, 3, 1))) this gives an object with invalid dimensions (not cut to count).

mdsumner commented 5 years ago

I don't see the problem?

f <- "tas_rcp85_land-rcm_eur_12km_01_mon_198012-208011.nc" ## 850 Mb, requires login, and manual download
 read_ncdf(f, ncsub = cbind(start = c(1, 1, 1, 1), count = c(100, 100, 3, 1)))
stars object with 4 dimensions and 1 attribute
attribute(s):
    tas [°C]
 Min.   :-3.648
 1st Qu.: 9.904
 Median :12.657
 Mean   :12.235
 3rd Qu.:15.287
 Max.   :20.234
dimension(s):
                from  to     offset   delta refsys point
grid_longitude     1 100         NA      NA     NA FALSE
grid_latitude      1 100         NA      NA     NA FALSE
time               1   3 1980-12-16 30 days  PCICt    NA
ensemble_member    1   1         NA      NA     NA    NA
                                            values
grid_longitude  [331.9,332.01),...,[377.77,377.88) [x]
grid_latitude     [-23.1,-22.99),...,[21.45,21.56) [y]
time                                          NULL
ensemble_member                                  1
> str(read_ncdf(f, ncsub = cbind(start = c(1, 1, 1, 1), count = c(100, 100, 3, 1))))
List of 1
 $ tas:Object of class units:
 num [1:100, 1:100, 1:3, 1] 14.8 14.9 15 15.1 15.1 ...
  ..- attr(*, "units")=List of 2
  .. ..$ numerator  : chr "°C"
  .. ..$ denominator: chr(0)
  .. ..- attr(*, "class")= chr "symbolic_units"

sessionInfo()
R version 3.6.0 (2019-04-26)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.2 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
 [1] LC_CTYPE=en_AU.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_AU.UTF-8        LC_COLLATE=en_AU.UTF-8
 [5] LC_MONETARY=en_AU.UTF-8    LC_MESSAGES=en_AU.UTF-8
 [7] LC_PAPER=en_AU.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] stars_0.3-1 sf_0.7-4    abind_1.4-5
mdsumner commented 5 years ago

Another thing is that these grid_longitude/grid_latitude vectors are only rectilinear via numeric noise, noise that raster ignores fwiw. I haven't tried dealing with the rotated grid part at all ...

edzer commented 5 years ago

If you safe the object to r, then

> dim(st_dimensions(r))
 grid_longitude   grid_latitude            time ensemble_member 
            418             406               3               1 
> dim(r)
 grid_longitude   grid_latitude            time ensemble_member 
            100             100               3               1 

and trying to plot it breaks (for this reason, I guess). This is a mess I created, because I extended code to read in dimension bounds, but ignored dimension selection after doing so.

edzer commented 5 years ago

btw raster will also ignore it if it is not noise, because it doesn't support rectilinear. But indeed:

> r= read_ncdf(f, ncsub = cbind(start = c(1, 1, 1, 1), count = c(100, 100, 3, 1)), eps=1e-3)
> r
stars object with 4 dimensions and 1 attribute
attribute(s):
    tas [°C]     
 Min.   :-3.648  
 1st Qu.: 9.904  
 Median :12.657  
 Mean   :12.235  
 3rd Qu.:15.287  
 Max.   :20.234  
dimension(s):
                from  to     offset   delta refsys point values    
grid_longitude     1 100      331.9    0.11     NA    NA   NULL [x]
grid_latitude      1 100      -23.1    0.11     NA    NA   NULL [y]
time               1   3 1980-12-16 30 days  PCICt    NA   NULL    
ensemble_member    1   1         NA      NA     NA    NA      1    
mdsumner commented 5 years ago

Oh I see, I have similar todo with read_ncdf and proxy

With raster I guess it has a coarser eps, definitely agree with how stars behaves there. I think generally it's when netcdf coordinates are stored in float but need double (degenerate rectilinear so to say)

mdsumner commented 5 years ago

Oh I see, I have similar todo with read_ncdf and proxy

With raster I guess it has a coarser eps, definitely agree with how stars behaves there. I think generally it's when netcdf coordinates are stored in float but need double (degenerate rectilinear so to say)

dblodgett-usgs commented 5 years ago

For the record, I've found this to be very common in NetCDF lat/lon coordinates. Over in intersectr I actually just added a "regularize" flag to make it easy for users to say, "no really, this is a regular coordinate variable." rather than making people guess about an appropriate e-# epsilon input. Not sure that's an approach worth taking here, but it seems like a friendly way to handle it.

edzer commented 5 years ago

I'd be OK. But also eps should have a better default than 1e-12.

mdsumner commented 5 years ago

It's the broken cases that muck it up, all regular but the first, or last - just broken because it should have been an offset/scale, where intent is clear

edzer commented 5 years ago

This is what I can get:

f = "tas_rcp85_land-rcm_eur_12km_01_mon_198012-208011.nc"
library(stars)
r = read_ncdf(f, ncsub = cbind(start = c(1, 1, 1, 1), count = c(418, 406, 3, 1)), eps=1e-3)
rx = read_stars(f, proxy = TRUE) # only for the crs!
st_crs(r) = st_crs(rx)
r0 = stars:::st_transform_proj.stars(r, 4326)
png("x.png")
plot(r0[,,,1], border = NA, axes = TRUE, reset = FALSE)
library(rnaturalearth)
plot(ne_coastline(returnclass = "sf"), add = TRUE, col = 'orange')

x

(now with units in plot title: https://github.com/r-spatial/stars/commit/fdc5613591d28e69411ad0b4d7a624ad8a9df130)

mdsumner commented 5 years ago

Huh, that's not what I was expecting! So

That's why ob_trans? They rotate the pole so it's close enough to regular at a local equator? (this is a new one for me)

edzer commented 5 years ago

A similar case was found in #52

adrfantini commented 5 years ago

That's why ob_trans? They rotate the pole so it's close enough to regular at a local equator? (this is a new one for me)

Yep! Not the best approach imho, but it's simpler to implement than proper projected coordinates, which not all models have.

mdsumner commented 5 years ago

Funny definition of "simpler". It's definitely cultural, arbitrary ;)

adrfantini commented 5 years ago

Funny definition of "simpler". It's definitely cultural, arbitrary ;)

Remember that most climate scientists still use GrADS for plotting...

maurizio85 commented 5 years ago

Dear all, many thanks for your quick replies. Anyway I still have some differences with your output, as well as some errors in the R console. Actually, and practically speaking, in my .png plot the origin of the axes is still close to Germany and the left and right borders of the image are still rectilinear.

Here my R output using your code with the sessionInfo() at the end. I hope it helps R version 3.6.0 (2019-04-26) -- "Planting of a Tree" Copyright (C) 2019 The R Foundation for Statistical Computing Platform: x86_64-pc-linux-gnu (64-bit) [...removed text...]

f = 'tas_rcp85_land-rcm_eur_12km_01_mon_198012-208011.nc' library(stars) Loading required package: abind Loading required package: sf Linking to GEOS 3.6.2, GDAL 2.2.3, PROJ 4.9.3 r = read_ncdf(f, ncsub = cbind(start = c(1, 1, 1, 1), count = c(418, 406, 3, 1)), eps=1e-3) rx = read_stars(f, proxy = TRUE) # only for the crs! Warning messages: 1: In CPL_read_gdal(as.character(x), as.character(options), as.character(driver), : GDAL Message 1: The dataset has several variables that could be identified as vector fields, but not all share the same primary dimension. Consequently they will be ignored. 2: In CPL_read_gdal(as.character(x), as.character(options), as.character(driver), : GDAL Message 1: dimension #1 (time) is not a Time dimension. 3: In CPL_read_gdal(as.character(x), as.character(options), as.character(driver), : GDAL Message 1: dimension #0 (ensemble_member) is not a Time dimension. st_crs(r) = st_crs(rx) r0 = stars:::st_transform_proj.stars(r, 4326) plot(r0[,,,1], border = NA, axes = TRUE, reset = FALSE) Warning messages: 1: In classInt::classIntervals(na.omit(values), min(nbreaks - 1, n.unq), : N is large, and some styles will run very slowly; sampling imposed 2: In st_is_longlat(x) : bounding box has potentially an invalid value range for longlat data 3: In st_is_longlat(x) : bounding box has potentially an invalid value range for longlat data 4: In st_is_longlat(x) : bounding box has potentially an invalid value range for longlat data 5: In st_is_longlat(x) : bounding box has potentially an invalid value range for longlat data 6: In st_is_longlat(x) : bounding box has potentially an invalid value range for longlat data library(rnaturalearth) plot(ne_coastline(returnclass = "sf"), add = TRUE, col = 'orange') Warning message: In plot.sf(ne_coastline(returnclass = "sf"), add = TRUE, col = "orange") : ignoring all but the first attribute sessionInfo() R version 3.6.0 (2019-04-26) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 18.04.2 LTS

Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1 LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale: [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C LC_TIME=it_IT.UTF-8 LC_COLLATE=en_GB.UTF-8 LC_MONETARY=it_IT.UTF-8 LC_MESSAGES=en_GB.UTF-8
[7] LC_PAPER=it_IT.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=it_IT.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] rnaturalearth_0.1.0 stars_0.3-1 sf_0.7-4 abind_1.4-5

loaded via a namespace (and not attached): [1] Rcpp_1.0.1 magrittr_1.5 units_0.6-3 tidyselect_0.2.5 lattice_0.20-38 R6_2.4.0 rlang_0.3.4
[8] rnaturalearthdata_0.2.0 dplyr_0.8.0.1 tools_3.6.0 parallel_3.6.0 grid_3.6.0 KernSmooth_2.23-15 ncmeta_0.0.4
[15] e1071_1.7-1 DBI_1.0.0 class_7.3-15 assertthat_0.2.1 tibble_2.1.1 RNetCDF_1.9-1 crayon_1.3.4
[22] tidyr_0.8.3 purrr_0.3.2 PCICt_0.5-4.1 glue_1.3.1 sp_1.3-1 compiler_3.6.0 pillar_1.3.1
[29] classInt_0.3-3 pkgconfig_2.0.2

dblodgett-usgs commented 5 years ago

Any chance we can get a test file for this issue? Also, it seems that the issue title is no longer accurate? I'd like to test this against my latest work on read_ncdf and harden up the bounds handling but I'm not quite sure what the issue really is and would rather not sign up to download the sample data.

edzer commented 5 years ago

I sent you a link to the file over email.

dblodgett-usgs commented 5 years ago

OK -- I see now. This should be a good opportunity to take the next step with what I've been thinking about to get the NetCDF dimensions, coordinate variables, and canonical stars axes aligned.

I've added a couple things to the dims tibble that will help here and have been planning on building it up so that we can use it for NetCDF proxy objects that support spatio-temporal subsetting. The trick is that you have to subset the coordinate variables, transfer that to NetCDF dimension indexes, then transfer the index subset to data variable access. Then you've got to make sure the shape of the returned data is appropriate for the stars dimension object. -- it's a lot of complexity but I think I've got it more or less handled. I've got a few other things in motion here but think I can work up a fix for this next week.

For the record, the dims tibble looks like: dims

# A tibble: 4 x 7
     id name            length unlim coord_dim coord_var      axis 
  <dbl> <chr>            <dbl> <lgl> <lgl>     <chr>          <chr>
1     3 grid_longitude     418 FALSE TRUE      grid_longitude X    
2     2 grid_latitude      406 FALSE TRUE      grid_latitude  Y    
3     1 time              1200 FALSE TRUE      ""             Z    
4     0 ensemble_member      1 FALSE TRUE      ""             T