Closed pvanlaake closed 8 months ago
Thanks, these are all good observations. The three different functions use three different code paths to get to the same array:
read_stars
uses "legacy" GDAL RasterDataSet: this returns a dataset with a large number of layers, stars
then tries to figure out the (time) dimension properties of the layers and puts it in the time dimension. This GDAL data model doesn't have names for x and y dimensions, hence they always get named x
and y
by read_stars
. GDAL traditionally organizes imagery with row index increasing with decreasing y coordinate ("image orientation"), hence the negative y
offset
read_ncdf
uses R packages ncmeta
and RNetCDF
, the latter directly interfaces to the netcdf
C library. read_mdim
uses the newer GDAL multidimensional array C++ interface, which for instance also reads and writes Zarr files (I'm not sure if read_ncdf
can do this)The summary stats are "summary of first 1e+05 cells", which depends on the order in the array and hence on the sign of y$delta
: for read_stars
this starts north, the others south. You can pass on n=Inf
to print.stars
to get the summary of the entire array, this should be identical for all.
Another observation is that only read_ncdf
returns irregular x
and y
dimension values (no offset
and delta
): maybe they are nearly regularly spaced, and the tolerance for deciding they are to be considered regularly spaced is too tight.
The fact that each of them, plotted, gives the same image confirms that these are semantically nearly identical representations of the data.
Interesting then that read_stars()
and read_mdim()
return different data organisation while both are based on GDAL.
Is there an interest to more closely align the information collected through each function? As mentioned, I am tinkering with ncdf.R
and could do a proper fork and then send you a PR.
Interesting then that read_stars() and read_mdim() return different data organisation while both are based on GDAL.
Yes: one code base, two different data models
Is there an interest to more closely align the information collected through each function? As mentioned, I am tinkering with ncdf.R and could do a proper fork and then send you a PR.
read_stars()
will always end up regular x
and y
dimensions, regardless whether they are regular in the netcdf file (another constraint of the RasterDataSet). It would be good if read_mdim()
and read_ncdf
agreed on this issue though, i.e. used the same tolerance.
I'm not sure what you are exactly tinkering with; @mdsumner and @dblodgett-usgs wrote read_ncdf()
, so some approval from them would also be good.
I am the developer of the CFtime
package, supporting the full range of defined CF Metadata Convention calendars. I have worked with Michael over the last 6 months or so to make that functionality available in ncmeta
. The dev version on GitHub now includes "extended" attributes, the only one of which is "time" (currently). Michael is working on a new release of ncmeta
that will include this. Michael is thus well aware of my efforts and also my intention to look at stars
. I am not sure how well Dave has been following recent developments but as ctb to ncmeta
he may be automatically notified of changes.
I'll make a proposal in a new issue to present my ideas in more detail.
Closing here now.
I have a file of monthly CMIP6 data:
I can read this file with three different functions:
read_stars()
,read_mdim()
andread_ncdf()
. All three, however, print different results:(Never mind the
CFtime
refsys inread_ncdf()
, I am tinkering with the code to replacePCICt
withCFtime
. That's for another issue.)Note that this is not a feature of the specific file that I am using here, it is consistent over whatever CF-compliant NetCDF file I throw at
stars
.Apart from some differences that are in the "raise-my-eyebrows" domain (
read_stars()
renames dimensions tox
andy
,standard_name
versuslong_name
for the variable,read_mdim()
produces aDate
for the time dimension, different representations of thex-y
coordinate system), there is one more vexing issue.read_stars()
produces different summary statistics from the other two functions. Upon closer examination of the data in the underlying file I noted thatread_stars()
flips the y-axis, which is also observable from the inversion of theoffset
anddelta
values in the printed information. If Iplot()
a time slice then the result is ok for all functions so it does not appear to be an error. But then what to make of this?