r-spatial / stars

Spatiotemporal Arrays, Raster and Vector Data Cubes
https://r-spatial.github.io/stars/
Apache License 2.0
558 stars 94 forks source link

Non `udunits` confirm units break the read in of netcdf files #670

Closed Martin-Jung closed 7 months ago

Martin-Jung commented 7 months ago

I need to read in and process a series of multi-dimensional arrays commonly used in climate-change research. Unfortunately reading in the netcdf file consistently fails because of a non-conform formatted unit.

I thought the parameter make_units = FALSE in read_ncdf() could turn any unit checks off, but it does not seem to do anything?

To reproduce, download test file from here and try to load in via stars.

Error message:

read_ncdf("lpjml_gfdl-esm4_w5e5_ssp126_2015soc_default_biom-swh-firr_global_annual-gs_2015_2100.nc", proxy = FALSE)

no 'var' specified, using biom-swh-firr other available variables: lon, lat, time Error: ‘growing seasons since ^(1601)-01-01 ^(00):^(00):^(00)’ is not a unit recognized by udunits or a user-defined unit In addition: Warning message: Could not parse expression: ‘growing seasons since ^(1601)-01-01 ^(00):^(00):^(00)’. Returning as a single symbolic unit()

Tested with stars version stars_0.6-4 and units_0.8-5

edzer commented 7 months ago

Indeed. I had more luck with

> read_mdim("lpjml_gfdl-esm4_w5e5_ssp126_2015soc_default_biom-swh-firr_global_annual-gs_2015_2100.nc", proxy = FALSE)
stars object with 3 dimensions and 1 attribute
attribute(s), summary of first 1e+05 cells:
                     Min.  1st Qu.   Median     Mean  3rd Qu.     Max.  NA's
biom-swh-firr [t/ha]    0 4.701078 7.037653 8.008385 11.60084 29.88955 54024
dimension(s):
     from  to              offset  delta         refsys x/y
lon     1 720                -180    0.5 WGS 84 (CRS84) [x]
lat     1 360                  90   -0.5 WGS 84 (CRS84) [y]
time    1  86 1970-01-01 01:06:54 1 secs      PCICt_360    
Warning message:
ignoring unrecognized unit: growing seasons since 1601-01-01 00:00:00 

although the wrong unit messes up the time unit. Your NetCDF file is wrong!

$ncdump -h ...
variables:
...
    double time(time) ;
...
        time:units = "growing seasons since 1601-01-01 00:00:00" ;
Martin-Jung commented 7 months ago

Thanks for checking, that is at least something (at least it reads in and I can overwrite the time dimension as I know it to be at annual resolution). Yeah it is a shame and sadly this affects other netcdf files there as well as everyone uses this unit...

Just to confirm, is my understanding of the make_units parameter correct in that it should ignore any 'unit' formatting when reading the dataset? Does it work only on attributes and/or dimensions? Otherwise I guess this issue can be closed and I will have to see how if the authors plan to update these netcdf files.

edzer commented 7 months ago

I don't know, @mdsumner and @dblodgett-usgs contributed read_ncdf(). The documentation says make_units if \code{TRUE} (the default), an attempt is made to set the units property of each variable, as such it doesn't say that when FALSE it will happily ignore invalid units, but I can understand that as a user you'd hope so.

dblodgett-usgs commented 7 months ago

make_units is only applicable to data variables. Try make_time = FALSE

> read_ncdf("lpjml_gfdl-esm4_w5e5_ssp126_2015soc_default_biom-swh-firr_global_annual-gs_2015_2100.nc", 
                    make_time = FALSE, make_units = FALSE)
no 'var' specified, using biom-swh-firr
other available variables:
 lon, lat, time
Will return stars object with 22291200 cells.
No projection information found in nc file. 
 Coordinate variable units found to be degrees, 
 assuming WGS84 Lat/Lon.
stars object with 3 dimensions and 1 attribute
attribute(s), summary of first 1e+05 cells:
               Min.  1st Qu.   Median     Mean  3rd Qu.     Max.  NA's
biom-swh-firr     0 4.701078 7.037653 8.008385 11.60084 29.88955 54024
dimension(s):
     from  to offset delta         refsys x/y
lon     1 720   -180   0.5 WGS 84 (CRS84) [x]
lat     1 360     90  -0.5 WGS 84 (CRS84) [y]
time    1  86    414     1             NA    
Martin-Jung commented 7 months ago

Thanks that helped further with parsing these data. Closing this for now as i think any further issues are related to the data themselves.