pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.63k stars 1.09k forks source link

How to organize the time dim? #9686

Closed QLmount-snow closed 3 weeks ago

QLmount-snow commented 3 weeks ago

I read many nc files into xarray and write them into zarr. For thesenc files, if I read and write the later time file first. I found the time sequece is by the order I read, not the data time order. How can I set the data time order in zarr? Such as I read time=f378 before time=f120, but I want in zarr time=f120 prior to time=f378 in zarr

...
v = 'gfs.2021032400_gfs.t00z.pgrb2.0p25.f378.nc'
...
d = 'gfs.2021032400.zarr'
xr.open_dataset(v, engine='netcdf4')
if os.path.exists(d):
            r = v.to_zarr(d, mode='a', append_dim='time')
        else:
            os.makedirs(d)
            r = v.to_zarr(d, mode='w')

the nc file information is like this:

netcdf gfs.2021032400_gfs.t00z.pgrb2.0p25.f384 {
dimensions:
        latitude = 257 ;
        longitude = 257 ;
        time = UNLIMITED ; // (1 currently)
variables:
        double latitude(latitude) ;
                latitude:units = "degrees_north" ;
                latitude:long_name = "latitude" ;
        double longitude(longitude) ;
                longitude:units = "degrees_east" ;
                longitude:long_name = "longitude" ;
        double time(time) ;
                time:units = "seconds since 1970-01-01 00:00:00.0 0:00" ;
                time:long_name = "verification time generated by wgrib2 function verftime()" ;
                time:reference_time = 1616544000. ;
                time:reference_time_type = 3 ;
                time:reference_date = "2021.03.24 00:00:00 UTC" ;
                time:reference_time_description = "forecast or accumulated, reference date is fixed" ;
                time:time_step_setting = "auto" ;
                time:time_step = 0. ;
        float GUST_surface(time, latitude, longitude) ;
                GUST_surface:_FillValue = 9.999e+20f ;
                GUST_surface:short_name = "GUST_surface" ;
                GUST_surface:long_name = "Wind Speed (Gust)" ;
                GUST_surface:level = "surface" ;
                GUST_surface:units = "m/s" ;
....

Here is what I read from zarr:

<xarray.DataArray 'GUST_surface' (time: 209, latitude: 257, longitude: 257)> Size: 55MB
dask.array<open_dataset-GUST_surface, shape=(209, 257, 257), dtype=float32, chunksize=(1, 129, 257), chunktype=numpy.ndarray>
Dimensions without coordinates: time, latitude, longitude
Attributes:
    level:       surface
    long_name:   Wind Speed (Gust)
    short_name:  GUST_surface
    units:       m/s
<xarray.DataArray 'time' (time: 209)> Size: 2kB
array([  0,   1,   2, ..., 206, 207, 208])
Dimensions without coordinates: time
max-sixty commented 3 weeks ago

Closing as requires MCVE