Closed shoyer closed 5 years ago
It would be good to know if this occurs with parallel=False
.
No, it works fine.
Another puzzle, I don't know it is related to the crashes.
Trying to localize the issue I added line after else
on line 453 in netCDF4_.py:
print('=======', name, encoding.get('chunksizes'))
ds0 = xr.open_dataset('/tmp/nam/bufr.701940/bufr.701940.2010123112.nc')
ds0.to_netcdf('/tmp/d0.nc')
This prints:
======= hlcy (1, 85)
======= cdbp (1, 85)
======= hovi (1, 85)
======= itim (1024,)
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-5-aeb92962e874> in <module>()
1 ds0 = xr.open_dataset('/tmp/nam/bufr.701940/bufr.701940.2010123112.nc')
----> 2 ds0.to_netcdf('/tmp/d0.nc')
/usr/local/Python-3.6.5/lib/python3.6/site-packages/xarray/core/dataset.py in to_netcdf(self, path, mode, format, group, engine, encoding, unlimited_dims, compute)
1220 engine=engine, encoding=encoding,
1221 unlimited_dims=unlimited_dims,
-> 1222 compute=compute)
1223
1224 def to_zarr(self, store=None, mode='w-', synchronizer=None, group=None,
/usr/local/Python-3.6.5/lib/python3.6/site-packages/xarray/backends/api.py in to_netcdf(dataset, path_or_file, mode, format, group, engine, encoding, unlimited_dims, compute, multifile)
718 # to be parallelized with dask
719 dump_to_store(dataset, store, writer, encoding=encoding,
--> 720 unlimited_dims=unlimited_dims)
721 if autoclose:
722 store.close()
/usr/local/Python-3.6.5/lib/python3.6/site-packages/xarray/backends/api.py in dump_to_store(dataset, store, writer, encoder, encoding, unlimited_dims)
761
762 store.store(variables, attrs, check_encoding, writer,
--> 763 unlimited_dims=unlimited_dims)
764
765
/usr/local/Python-3.6.5/lib/python3.6/site-packages/xarray/backends/common.py in store(self, variables, attributes, check_encoding_set, writer, unlimited_dims)
264 self.set_dimensions(variables, unlimited_dims=unlimited_dims)
265 self.set_variables(variables, check_encoding_set, writer,
--> 266 unlimited_dims=unlimited_dims)
267
268 def set_attributes(self, attributes):
/usr/local/Python-3.6.5/lib/python3.6/site-packages/xarray/backends/common.py in set_variables(self, variables, check_encoding_set, writer, unlimited_dims)
302 check = vn in check_encoding_set
303 target, source = self.prepare_variable(
--> 304 name, v, check, unlimited_dims=unlimited_dims)
305
306 writer.add(source, target)
/usr/local/Python-3.6.5/lib/python3.6/site-packages/xarray/backends/netCDF4_.py in prepare_variable(self, name, variable, check_encoding, unlimited_dims)
466 least_significant_digit=encoding.get(
467 'least_significant_digit'),
--> 468 fill_value=fill_value)
469 _disable_auto_decode_variable(nc4_var)
470
netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Dataset.createVariable()
netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Variable.__init__()
netCDF4/_netCDF4.pyx in netCDF4._netCDF4._ensure_nc_success()
RuntimeError: NetCDF: Bad chunk sizes.
The dataset is:
<xarray.Dataset>
Dimensions: (dim_1: 1, dim_prof: 60, dim_slyr: 4, ftim: 85, itim: 1)
Coordinates:
* ftim (ftim) timedelta64[ns] 00:00:00 01:00:00 ... 3 days 12:00:00
* itim (itim) datetime64[ns] 2010-12-31T12:00:00
Dimensions without coordinates: dim_1, dim_prof, dim_slyr
Data variables:
stnm (dim_1) float64 ...
rpid (dim_1) object ...
clat (dim_1) float32 ...
clon (dim_1) float32 ...
gelv (dim_1) float32 ...
clss (itim, ftim) float32 ...
pres (itim, ftim, dim_prof) float32 ...
tmdb (itim, ftim, dim_prof) float32 ...
uwnd (itim, ftim, dim_prof) float32 ...
vwnd (itim, ftim, dim_prof) float32 ...
spfh (itim, ftim, dim_prof) float32 ...
omeg (itim, ftim, dim_prof) float32 ...
cwtr (itim, ftim, dim_prof) float32 ...
dtcp (itim, ftim, dim_prof) float32 ...
dtgp (itim, ftim, dim_prof) float32 ...
dtsw (itim, ftim, dim_prof) float32 ...
dtlw (itim, ftim, dim_prof) float32 ...
cfrl (itim, ftim, dim_prof) float32 ...
tkel (itim, ftim, dim_prof) float32 ...
imxr (itim, ftim, dim_prof) float32 ...
pmsl (itim, ftim) float32 ...
prss (itim, ftim) float32 ...
tmsk (itim, ftim) float32 ...
tmin (itim, ftim) float32 ...
tmax (itim, ftim) float32 ...
wtns (itim, ftim) float32 ...
tp01 (itim, ftim) float32 ...
c01m (itim, ftim) float32 ...
srlm (itim, ftim) float32 ...
u10m (itim, ftim) float32 ...
v10m (itim, ftim) float32 ...
th10 (itim, ftim) float32 ...
q10m (itim, ftim) float32 ...
t2ms (itim, ftim) float32 ...
q2ms (itim, ftim) float32 ...
sfex (itim, ftim) float32 ...
vegf (itim, ftim) float32 ...
cnpw (itim, ftim) float32 ...
fxlh (itim, ftim) float32 ...
fxlp (itim, ftim) float32 ...
fxsh (itim, ftim) float32 ...
fxss (itim, ftim) float32 ...
fxsn (itim, ftim) float32 ...
swrd (itim, ftim) float32 ...
swru (itim, ftim) float32 ...
lwrd (itim, ftim) float32 ...
lwru (itim, ftim) float32 ...
lwrt (itim, ftim) float32 ...
swrt (itim, ftim) float32 ...
snfl (itim, ftim) float32 ...
smoi (itim, ftim) float32 ...
swem (itim, ftim) float32 ...
n01m (itim, ftim) float32 ...
r01m (itim, ftim) float32 ...
bfgr (itim, ftim) float32 ...
sltb (itim, ftim) float32 ...
smc1 (itim, ftim, dim_slyr) float32 ...
stc1 (itim, ftim, dim_slyr) float32 ...
lsql (itim, ftim) float32 ...
lcld (itim, ftim) float32 ...
mcld (itim, ftim) float32 ...
hcld (itim, ftim) float32 ...
snra (itim, ftim) float32 ...
wxts (itim, ftim) float32 ...
wxtp (itim, ftim) float32 ...
wxtz (itim, ftim) float32 ...
wxtr (itim, ftim) float32 ...
ustm (itim, ftim) float32 ...
vstm (itim, ftim) float32 ...
hlcy (itim, ftim) float32 ...
cdbp (itim, ftim) float32 ...
hovi (itim, ftim) float32 ...
Attributes:
model: Unknown
@yt87 how much data is necessary to reproduce this? is it feasible to share copies of the problematic files?
About 600k for 2 files. I could spend some time to try size that down, but if there is a way to upload the the whole set it would be easier for me.
600 KB? You should be able to attach that to a comment on Github -- you'll just need to combine them into a .zip
or .gz
file first.
I did some further tests, the crash occurs somewhat randomly.
I meant at random points during execution. The script crashed every time.
The error
RuntimeError: NetCDF: Bad chunk sizes.
is unrelated to the original problem with segv crashes. It is caused by a bug in netcdf4 C library. It is fixed in the latest version 4.6.1. As of yesterday, the newest netcdf4-python manylinux wheel contains an older version. The solution is to build netcdf4-python from source.
The segv crashes occur with other datasets as well. Example test set I used:
file = '/tmp/dx{:d}.nc'.format(year)
#times = pd.date_range('{:d}-01-01'.format(year), '{:d}-12-31'.format(year), name='time')
times = pd.RangeIndex(year, year+300, name='time')
v = np.array([np.random.random((32, 32)) for i in range(times.size)])
dx = xr.Dataset({'v': (('time', 'y', 'x'), v)}, {'time': times})
dx.to_netcdf(file, format='NETCDF4', encoding={'time': {'chunksizes': (1024,)}},
unlimited_dims='time')
A simple fix is to change the scheduler as I did in my original post.
After upgrading to anaconda python 3.7 the code works without crashes. I think this issue can be closed.
Copied from the report on the xarray mailing list:
This crashes with SIGSEGV:
Traceback:
This happens with the most recent dask and xarray:
INSTALLED VERSIONS
commit: None python: 3.6.5.final.0 python-bits: 64 OS: Linux OS-release: 4.18.14-200.fc28.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_CA.UTF-8 LOCALE: en_CA.UTF-8
xarray: 0.11.0 pandas: 0.23.0 numpy: 1.15.2 scipy: 1.1.0 netCDF4: 1.4.0 h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.0.0b1 PseudonetCDF: None rasterio: None iris: None bottleneck: 1.3.0.dev0 cyordereddict: None dask: 0.20.1 distributed: 1.22.1 matplotlib: 3.0.0 cartopy: None seaborn: 0.9.0 setuptools: 39.0.1 pip: 18.1 conda: None pytest: 3.6.3 IPython: 6.3.1 sphinx: 1.8.1
When I change the code in open_mfdataset to use parallel scheduler, the code runs as expected.
The file sizes are about 300kB, my example reads only 2 files.