Closed ZedThree closed 1 month ago
Very exciting to see progress on this! Thanks for pushing this upstream.
Just to confirm, what is the minimum version of netCDF-C required for this functionality?
Just to confirm, what is the minimum version of netCDF-C required for this functionality?
For complex
written via netCDF4
: no minimum version, uses standard netCDF functionality (does depend on which netCDF-3/4 file format you use though)
For complex
written via h5netcdf/h5py
: 4.9.3 (unreleased, current main
branch on github, but release expected imminently)
What about support for tools that read netCDF via the netCDF-C library without using Python, e.g., ncdump
?
I ask because I want to clarify that support for complex numbers is "official" for netCDF, which as far as I understand basically means that it's in the netCDF-C library. Right now, the documentation for netCDF4-Python suggests that complex number support requires the nc-complexhttps://github.com/PlasmaFAIR/nc-complex).
Ah sorry, to be clear, support for complex numbers is still not "official" for netCDF and not available directly in the netCDF-C library. The official Python interface has some helper functionality through the auto_complex
keyword. The representation on disk uses existing netCDF-C functionality, with nc-complex
handling the conversions between complex
types and netCDF datatypes. This means that using nc-complex
to write complex numbers doesn't affect other tools ability to read those files.
netCDF4.Dataset
has auto_complex=False
by default to emphasise that this is a helper utility.
ncdump
will show the underlying representation:
import netCDF4
complex_array = np.array([0 + 0j, 1 + 0j, 0 + 1j, 1 + 1j, 0.25 + 0.75j])
with netCDF4.Dataset("complex.nc", "w", auto_complex=True) as nc:
nc.createDimension("x", size=len(complex_array))
var = nc.createVariable("data", "c16", ("x", ))
var[:] = complex_array
With the current netCDF-C release, 4.9.2:
$ ncdump complex.nc
netcdf complex {
types:
compound _PFNC_DOUBLE_COMPLEX_TYPE {
double r ;
double i ;
}; // _PFNC_DOUBLE_COMPLEX_TYPE
dimensions:
x = 5 ;
variables:
_PFNC_DOUBLE_COMPLEX_TYPE data(x) ;
data:
data = {0, 0}, {1, 0}, {0, 1}, {1, 1}, {0.25, 0.75} ;
}
There's nothing special about this file, and it should be readable by any tool that can read standard netCDF-4 files, but they'll get a compound datatype:
with netCDF4.Dataset("complex.nc", "r") as nc:
print(nc["data"])
print(nc["data"][:])
# <class 'netCDF4._netCDF4.Variable'>
# compound data(x)
# compound data type: {'names': ['r', 'i'], 'formats': ['<f8', '<f8'], 'offsets': [0, 8], 'itemsize': 16, 'aligned': True}
# unlimited dimensions:
# current shape = (5,)
# [(0. , 0. ) (1. , 0. ) (0. , 1. ) (1. , 1. ) (0.25, 0.75)]
with netCDF4.Dataset("complex.nc", "r", auto_complex=True) as nc:
print(nc["data"])
print(nc["data"][:])
# <class 'netCDF4._netCDF4.Variable'>
# compound data(x)
# compound data type: complex128
# unlimited dimensions:
# current shape = (5,)
# [0. +0.j 1. +0.j 0. +1.j 1. +1.j 0.25+0.75j]
Creating the same file with h5netcdf
and invalid_netcdf=True
, and using the development version of netCDF-C's ncdump
:
netcdf complex {
types:
compound _AnonymousCompound1 {
double r ;
double i ;
}; // _AnonymousCompound1
dimensions:
x = 5 ;
variables:
_AnonymousCompound1 data(x) ;
data:
data = {0, 0}, {1, 0}, {0, 1}, {1, 1}, {0.25, 0.75} ;
}
Literally the only difference here is netCDF4
/nc-complex
creates a named type, while h5py
uses an unnamed type that the netCDF-C library gives a temporary name to. Any tool that uses this version of netCDF-C (or uses HDF5) will be able to read this file.
(In principle, h5py could commit this datatype to the file, and then current versions of netCDF would be able to read it too)
Lastly, HDF5 is now adding native support for complex numbers that will also be able to read the {r, i}
compound datatype used by h5py
and nc-complex
. I believe this will be in 1.15. After that, we might be able to get native support into netCDF-C, but I suspect that will take some time.
Sorry for the info dump, there's multiple libraries and versions thereof, so I just wanted to be really clear about what all the different capabilities are!
(In principle, h5py could commit this datatype to the file, and then current versions of netCDF would be able to read it too)
This would be very nice!
Overall, it seems that things have changed for the better since when I first added support for writing invalid netCDF into h5netcdf, because netCDF-C can now read the data associated with these HDF5 enum types.
At this point, I would say complex number support could be considered a "convention" for writing netCDF files. That means we could add direct support for reading/writing complex numbers in Xarray, without requiring invalid_netcdf=True
.
Ok, I'll have a stab at a PR for this then!
Is your feature request related to a problem?
Currently, the only supported method for saving complex numbers to netCDF is to use the
h5netcdf
engine withinvalid_netcdf=True
.The latest release of
netCDF4
, 1.7.1, now has support for complex numbers. Also, because it's built on the up-coming netCDF 4.9.3, it includes support for reading files written usingh5netcdf
withinvalid_netcdf=True
(and I think for pretty much all other types that fall under this, including enums and bools).This means that there are now more options for users wanting to read/write complex numbers, although this depends on the version of
netCDF4
in use and the flags used to open files.Just to note that the complex number support in
netCDF4
is done via converting to/from a compound datatype (or acomplex
dimension for netCDF-3 files), and is completely standard netCDF -- it's basically just a wrapper in the Python API.Describe the solution you'd like
Either expose the
auto_complex
keyword fornetCDF4.Dataset
to the backend or set this automatically. This will either require usingnetCDF4>=1.7.1
or detecting the version.Additionally,
invalid_netcdf
will no longer be strictly necessary for bools and complex numbers forh5netcdf
, although again this will be dependent on the version of netCDF used -- this is sort of outside of xarray though.I'm happy to implement this as it's pretty trivial, but is it better
Describe alternatives you've considered
No response
Additional context
There is currently an issue mixing
netCDF4==1.7.1
andh5py==3.11.0
in the same python process, see: https://github.com/Unidata/netcdf4-python/issues/1343 and https://github.com/h5py/h5py/issues/2453. This might need to be resolved before xarray can requirenetCDF4==1.7.1
Any insights on how to resolve this would be much appreciated!