Open crusaderky opened 6 years ago
I agree it would be great to have this feature.
There has already been lots discussion of this on #1385 and #1823. I tried and failed to implement something similar in #1413. I recommend reviewing those threads before jumping in to this.
This is a follow-up from #1521.
When invoking open_mfdataset, very frequently the user knows in advance that all of his coords that aren't on the concat_dim are already aligned, and may be willing to blindly trust such assumption in exchange of a huge performance boost.
My production data: 200x NetCDF files on a not very performant NFS file system, concatenated on the "scenario" dimension:
If I skip loading and comparing the non-index coords from all 200 files:
If I skip loading and comparing also the index coords from all 200 files:
Proposed design
Add a new optional parameter to open_mfdataset,
assume_aligned=None
. It can be valued to a list of variable names or "all", and requiresconcat_dim
to be explicitly set. It causes open_mfdataset to use the first occurrence of every variable and blindly skip loading the subsequent ones.Algorithm