Open rabernat opened 7 years ago
This issue is almost seven years old! It has been "fixed" many times since my original post, but people keep finding new ways to make it reappear. 😆
It seems like having better diagnostics / logging of what is happening under the hood with open_mfdataset is what people really need. Maybe even some sort of utility to pre-scan the files and figure out if they are easily openable or not.
This issue is almost seven years old! It has been "fixed" many times since my original post, but people keep finding new ways to make it reappear. 😆
It seems like having better diagnostics / logging of what is happening under the hood with open_mfdataset is what people really need. Maybe even some sort of utility to pre-scan the files and figure out if they are easily openable or not.
Both of those seem like great ideas. Maybe there could be a verbose or logging mode to help users identify what is wrong with the files (e.g., where the time is being spent and whether that seems suspicious). It is probably true that people (like me) will keep finding new ways to generate problematic netcdf files. (I'm sure we can think of something even worse than 20 Hz data referenced to a time origin 75 years ago).
We have a dataset stored across multiple netCDF files. We are getting very slow performance with
open_mfdataset
, and I would like to improve this.Each individual netCDF file looks like this:
As shown above, a single data file opens in ~60 ms.
When I call
open_mdsdataset
on 49 files (each with a differenttime
dimension but the samenpart
), here is what happens:It takes over 2 minutes to open the dataset. Specifying
concat_dim='time'
does not improve performance.Here is
%prun
of theopen_mfdataset
command.It looks like most of the time is being spent on
reindex_variables
. I understand why this happens...xarray needs to make sure the dimensions are the same in order to concatenate them together.Is there any obvious way I could improve the load time? For example, can I give a hint to xarray that this
reindex_variables
step is not necessary, since I know that all thenpart
dimensions are the same in each file?Possibly related to #1301 and #1340.