Open enrico-mi opened 1 week ago
Thanks for opening your first issue here at xarray! Be sure to follow the issue template! If you have an idea for a solution, we would really welcome a Pull Request with proposed changes. See the Contributing Guide for more. It may take us a while to respond here, but we really value your contribution. Contributors like you help make xarray better. Thank you!
What happened?
I am writing a snippet of code that reads netCDF files and convert them to a dask dataframe with the method
read_nc_to_df()
.When opening multiple netCDF files with
open_dataset()
anddask.delayed
, the code fails withsegmentation fault
error. Some times the segmentation fault error is preceded by more information, like:*** Error in 'python3': double free or corruption (fasttop): 0x0000000003448df0 ***
(this happened only once)[Errno -101] NetCDF: HDF error: 'path/to/file/output_00.nc' There are 28 HDF5 objects open! Report: open objects on 72057594037927944
(this happened multiple times)Note that I simply re-execute the same code, and different error outputs might appear, the only constant being the
segmentation fault
line.What did you expect to happen?
I expected the code to execute regularly, open the netcdf files, and eventually convert the multiple datasets into a dask dataframe.
Minimal Complete Verifiable Example
MVCE confirmation
Relevant log output
Anything else we need to know?
The same
read_nc_to_df
function works regularly when:engine='h5netcdf'
, orwith threading.Lock()
statement, ordask
.The example above illustrates these situations, too. The behaviour seems to contradict the documentation, which states that "By default, appropriate locks are chosen to safely read and write files with the currently active dask scheduler."
The same
read_nc_to_df
function does not work regularly when a lock is explicitly passed toopen_dataset()
through thebackend_kwargs
argument. This is not in the above example to keep it more coincise.Environment