Open TomNicholas opened 6 days ago
@jsignell not sure if this is a bug with the loading of cftime variables, the implementation of xr.combine_by_coords
, or somewhere else.
Okay so this happens with xr.concat
too
In [11]: ds1 = open_virtual_dataset('air1.nc', loadable_variables=['time', 'lat', 'lon'], cftime_variables=['time'])
In [12]: ds2 = open_virtual_dataset('air2.nc', loadable_variables=['time', 'lat', 'lon'], cftime_variables=['time'])
In [13]: xr.concat([ds1, ds2], coords='minimal', compat='override', dim='time')
Out[13]:
<xarray.Dataset> Size: 8MB
Dimensions: (time: 2920, lat: 25, lon: 53)
Coordinates:
* lat (lat) float32 100B 75.0 72.5 70.0 67.5 65.0 ... 22.5 20.0 17.5 15.0
* lon (lon) float32 212B 200.0 202.5 205.0 207.5 ... 325.0 327.5 330.0
* time (time) float32 12kB 1.867e+06 1.867e+06 ... 1.885e+06 1.885e+06
Data variables:
air (time, lat, lon) int16 8MB ManifestArray<shape=(2920, 25, 53), d...
Attributes:
Conventions: COARDS
description: Data is from NMC initialized reanalysis\n(4x/day). These a...
platform: Model
references: http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanaly...
title: 4x daily NMC reanalysis (1948)
but only because I didn't pass indexes={}
. Concat works fine if you don't create indexes:
In [8]: ds1 = open_virtual_dataset('air1.nc', loadable_variables=['time', 'lat', 'lon'], cftime_variables=['time'], indexes={})
In [9]: ds2 = open_virtual_dataset('air2.nc', loadable_variables=['time', 'lat', 'lon'], cftime_variables=['time'], indexes={})
In [10]: xr.concat([ds1, ds2], coords='minimal', compat='override', dim='time')
Out[10]:
<xarray.Dataset> Size: 8MB
Dimensions: (time: 2920, lat: 25, lon: 53)
Coordinates:
* lat (lat) float32 100B 75.0 72.5 70.0 67.5 65.0 ... 22.5 20.0 17.5 15.0
* lon (lon) float32 212B 200.0 202.5 205.0 207.5 ... 325.0 327.5 330.0
time (time) datetime64[ns] 23kB 2013-01-01T00:02:06.757437440 ... 201...
Data variables:
air (time, lat, lon) int16 8MB ManifestArray<shape=(2920, 25, 53), d...
Attributes:
Conventions: COARDS
description: Data is from NMC initialized reanalysis\n(4x/day). These a...
platform: Model
references: http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanaly...
title: 4x daily NMC reanalysis (1948)
This almost certainly does just link back to #18 then.
I realized that
xr.combine_by_coords
should actually already work fine - if you are willing to load the relevant coordinates into memory (and therefore also have those values saved into the resultant references on-disk).the only issue with this result is that somehow the dtype of
time
has been changed fromdatetime64[ns]
tofloat32
.This approach doesn't solve the original issue, but it might also be fine in a lot of cases.
xr.combine_by_coords
can only auto-order along dimensions that have a1D
coordinate, and 1D variables are small, so if they are split across many files its likely that you wanted to include them inloadable_variables
anyway.Originally posted by @TomNicholas in https://github.com/zarr-developers/VirtualiZarr/issues/18#issuecomment-2200498037