Open anonymousForPeer opened 3 years ago
Hi @areichmuth - I would love to help but I would need some more information from you first.
What is a "tilted grid index"? Do you mean that the files have not been combined in the order you expected them to be?
It's very hard to debug problems unless I can reproduce them locally. Do you have some example data files you could upload that this problem occurs with? Or even better some small code snippet that generates an example which shows the same issue?
Thank you @TomNicholas - strangely I can't reproduce it anymore on my local machine - it all happened on our slurm. The result is correct according to the input file index. In my case I calculated annual and seasonal climate variables on the same input files, but the matrix index i,j were different. One with upper left corner (0,0) and the other one with (0,1167) - as shown in ncview. Nevertheless here is what I did - you can test it with
import numpy as np
import xarray as xr
##creating the chunks - our slurm can't handle dask_jobqueue and dask chunking wasnt possible as well
x=[x.tolist() for x in np.array_split(range(lonrange), chunks)]
xextend = [[sublist[0],sublist[-1]] for sublist in x]
y=[y.tolist() for y in np.array_split(range(latrange), chunks)]
yextend = [[sublist[0],sublist[-1]] for sublist in y]
#concatenating the chunks
allChunks = [[x,y] for x in xextend for y in yextend]
for k in range(0,chunks*chunks):
inter = str(k)
tas = xr.open_dataset('~/pathToFile/').isel(longitude=slice(min(allChunks[k][0]), max(allChunks[k][0])), latitude=slice(min(allChunks[k][1]), max(allChunks[k][1])))
##instead of my climate calculations
#combining the single data arrays per chunk
##combine using nested
with xr.open_mfdataset('~/pathToFile/climateCalculation*'+inter+'.nc', chunks=-1, parallel=True, engine='h5netcdf', combine='nested') as ds:
#combine using default coords
with xr.open_mfdataset('~/pathToFile/climateCalculation*'+inter+'.nc', chunks=-1, parallel=True, engine='h5netcdf') as ds:
##combining all chunks to one final file
##nested input
with xr.open_mfdataset('~/pathToFile/climateCalculations/nestedClimateAnnualCalculations_*', chunks=-1, parallel=True, engine='h5netcdf') as ds:
with xr.open_mfdataset('~/pathToFile/climateCalculations/climateAnnualCalculations_*', chunks=-1, parallel=True, engine='h5netcdf') as ds:
My calculations return a strange tilted index. Why does this happen?
What happened: I combined several user defined chunked netcdf data (900 chunks) into one dataset. For this I used the default combine_by_coords in ds.open_mf_dataset(). My result was a tilted grid index - upper left corner i=0, j=1167.
Beforehand I calculated some indices on these chunks and combined them with the default combine='by_coords' in ds.open_mf_dataset() but also tested the combine='nested' separately.
The ones where I used default combine='by_coords' for all functions returned the tilted index. The ones where I used combine='nested' beforehand and then default combine='by_coords' returned the correct index.
What you expected to happen: No tilted index.
Minimal Complete Verifiable Example:
Anything else we need to know?:
Environment: Python 3.7.4
Output of xr.show_versions()
INSTALLED VERSIONS ------------------ commit: None python: 3.7.4 (default, Jun 3 2020, 14:52:58) [GCC 8.3.0] python-bits: 64 OS: Linux OS-release: 3.10.0-1160.15.2.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: de_DE.UTF-8 LOCALE: ('de_DE', 'UTF-8') libhdf5: 1.12.0 libnetcdf: 4.7.4 xarray: 0.18.2 pandas: 0.25.3 numpy: 1.17.3 scipy: 1.3.1 netCDF4: 1.5.7 pydap: installed h5netcdf: 0.11.0 h5py: 3.3.0 Nio: None zarr: 2.8.3 cftime: 1.5.0 nc_time_axis: 1.3.1 PseudoNetCDF: None rasterio: 1.2.6 cfgrib: None iris: None bottleneck: 1.3.2 dask: 2021.06.2 distributed: 2021.06.2 matplotlib: 3.4.2 cartopy: None seaborn: 0.11.1 numbagg: 0.2.1 pint: 0.17 setuptools: 57.0.0 pip: 21.1.3 conda: None pytest: None IPython: None sphinx: None