openPMD / openPMD-standard

:notebook: Open Standard for Particle-Mesh Data
http://www.openPMD.org
Creative Commons Attribution 4.0 International
79 stars 29 forks source link

Complex Numbers: define touple of two as integral? #29

Open ax3l opened 9 years ago

ax3l commented 9 years ago

It might be useful to allow touples of two numbers to be defined as integral type for complex components of records.

alternatively, one would need a consistent naming convention for the components.

numpy complex types - how does h5py (FAQ: HDF5 struct) and the official HDF5 standard handle these?

ax3l commented 9 years ago

Careful: HDF5 Fortran support for compound data types is pretty bad.

Link moved: archive

DavidSagan commented 5 years ago

@ax3l

Careful: HDF5 Fortran support for compound data types is pretty bad.

Additionally other formats may not support directly complex numbers at all. To maximize portability, I would vote for the standard mandating complex storage as real and imaginary datasets or amplitude and phase datasets. If using amplitude and phase, should the conversion factor to SI convert to radians or radians/2pi?

ax3l commented 5 years ago

Side note: ADIOS2 has first-class complex support.

To maximize portability, I would vote for the standard mandating complex storage as real and imaginary datasets or amplitude and phase datasets.

I am not entirely decided on that yet, maybe some formats just don't support complex and I am really intrigued to push those to improve instead. We don't have "required" complex records or attributes yet besides the geometry thetaMode, where we split in real and imaginary so far. Amplitude and phase is a good point as well... actually phase normalized to -0.5:0.5 would be best from an IEEE-floating point of view ;-)

DavidSagan commented 5 years ago

I am not entirely decided on that yet, maybe some formats just don't support complex and I am really intrigued to push those to improve instead.

I'm not sure what you mean by "intrigued". One way to handle this would be to say that if the format supports complex numbers than that should be used and if not then use real/imaginary or amp/phase sub datasets.

We don't have "required" complex records or attributes yet besides the geometry thetaMode

Actually for the beam physics extension there is the photon polarization which is complex.

DavidSagan commented 5 years ago

@ax3l And hdf5 does not directly support complex numbers so this is a portability problem for the files I will be creating. So how about this for the standard:

If the format supports complex numbers as a native type then use the native type otherwise store the complex numbers in sub-groups as real and imaginary with labels real andimaginary, or amplitude and phase with labels amplitude andphase. If using phase, phase numbers when multiplied by unitSI should give radians.

I would not mandate that the phase be in any specific interval like [-pi, pi] since the integer part of the phase may be important.

Question: If the format supports complex numbers should a writer have the option to not use the native complex and instead store with re/im or amp/phase sub-groups?

DavidSagan commented 5 years ago

Note: Since the beam physics extension is using complex numbers in more than one place, if a convention for complex numbers does not get into the main standard, then a convention will have to be put into the extension.

ChristopherMayes commented 5 years ago

It looks like pytables and h5py both use 'r' and 'i' as names for the compound dataset. Actually, FORTRAN has had a complex datatype since at least FORTRAN 77. I don't like also specifying phase and amplitude - the user should convert.

DavidSagan commented 5 years ago

It looks like pytables and h5py both use 'r' and 'i' as names for the compound dataset.

Yes using 'r' and 'i' to be compatible with pytables and h5py would be a good idea.

I don't like also specifying phase and amplitude - the user should convert.

Fine with me.

ax3l commented 5 years ago

Yes, we should also stay by default what C11 and C++ do: a simple POD datatype of two float, double and long double of the order real+complex. We can easily add this to the C++ and Python frontend APIs. Fortran seams to do the same layout.

The problem is less how to layout the complex but more how to describe it on the low level in file formats without too much introduction of conventions we have to carry. ADIOS' .bp format supports complex which covers us for the data "round-trip" (writing it will allow us to identify it again). For HDF5 we have to decide for one of the various annotations because they did not standardize it... I guess will just do whatever h5py does since it is the most popular implementation and as Chris said, pytables seams to follow the same convention.

The Fortran issue with querying complex numbers mentioned above remains, have you by coincidence tried reading/writing a h5py complex from/to Fortran?

DavidSagan commented 5 years ago

Chris has been using h5py. I have not.

ax3l commented 5 years ago

@ChristopherMayes did writing a data set and/or attribute with complex numbers in h5py and then identifying it as complex and reading it via Fortran - and vice versa - work for you? Just looking for someone with experience in this so we can find a programmatically compatible solution.

As far as C++ and Python is concerned, we will just add it here: https://github.com/openPMD/openPMD-api/issues/203 and relax potential wording in the standard.

PrometheusPi commented 5 years ago

How would this new standard definition effect ℂ3 datasets as used in PIConGPU?

DavidSagan commented 4 years ago

@PrometheusPi

How would this new standard definition effect ℂ3 datasets as used in PIConGPU?

Here there are three complex numbers (x, y, z). PIConGPU uses a single level to store all 6 (real and imaginary) arrays. The convention for openPMD would be to have x, y, and z groups and then store "r" and "i" arrays as datasets of these groups. That is there would be two levels.

DavidSagan commented 4 years ago

@ax3l @ChristopherMayes Since the beam physics extension is actively being used, and since the question of complex storage has come up, I added a note to the extension on how complex numbers should be stored in line with this discussion. The idea is that when the 2.0 version is finalized, this text can be transferred to the base standard.

ChristopherMayes commented 4 years ago

For a complex array Z, having a group Z/ with arrays re and im or r and i seems consistent with the rest of the standard.

DavidSagan commented 4 years ago

@ChristopherMayes So right now the note I wrote mandates "r" and "i" to be consistent with h5py.

ax3l commented 4 years ago

How would this new standard definition effect ℂ3 datasets as used in PIConGPU?

You could then store them as regular record with record components as ℂ. Note to the reader: this is an in situ plugin of PIConGPU using this so far, not the main I/O (plugin) routines.

having a group Z/ with arrays re and im or r and i seems consistent with the rest of the standard.

I think generally extensions can do this but it makes parsing by a pure base-standard reader more complicated. My idea is actually, like the title of the issue, to just allow integral complex types (just like float) since I think HDF5 and ADIOS support this well for pairs of single and double precision. Yes, compatible to the C11 complex (, h5py), C++ and Fortran standard types for complex, one would only mandate the order and that both are stored as plain-old-data (POD) struct in r-i order (no padding, aliasing, etc.).

That's at least something that one can implement in base standard readers and writers and that's exactly what h5py is doing as well. Although extensions can play with sub-groups as they like, adding another group for such a fundamental type adds complexity to the base standard that might not be needed with an integral type and would complicate further extensions, such as MR storage groups later on.

This is how this is handled in HDF5 by convention of h5py: Example code:

import h5py as h5                                                                                
import numpy as np                                                                               

x = np.array([1.+2j, 3.+4j, 5.+6j], np.cdouble)                                                  
x                                                                                                
# array([1.+2.j, 3.+4.j, 5.+6.j])
x.dtype                                                                                          
# dtype('complex128')
f = h5.File('complex.h5', 'w')                                                                   
f['/mydata'] = x                                                                                 
f.close()

Creates a H5T_COMPOUND with r and i label:

$ h5ls complex.h5 
mydata                   Dataset {3}

$ h5ls -d -r complex.h5 
/                        Group
/mydata                  Dataset {3}
    Data:
        (0) {1, 2}, {3, 4}, {5, 6}

$ h5dump complex.h5
HDF5 "complex.h5" {
GROUP "/" {
   DATASET "mydata" {
      DATATYPE  H5T_COMPOUND {
         H5T_IEEE_F64LE "r";
         H5T_IEEE_F64LE "i";
      }
      DATASPACE  SIMPLE { ( 3 ) / ( 3 ) }
      DATA {
      (0): {
            1,
            2
         },
      (1): {
            3,
            4
         },
      (2): {
            5,
            6
         }
      }
   }
}
}

ADIOS1 and ADIOS2 have native type support for single and double complex.

DavidSagan commented 4 years ago

@ax3l So how would you word this in the standard? What happens if someone is using a format that does not support complex nor compound types?

Also with your example above do you have non h5py code that can read this in. I would be curious to see this (or how to write this using non-h5py code).

@ChristopherMayes Have you tried to read in grid field files with complex fields generated by Bmad using h5py?

ax3l commented 4 years ago

So how would you word this in the standard? What happens if someone is using a format that does not support complex nor compound types?

I would propose to just allow complex floating point types as integral types. File formats that do not support complex integral types simply cannot store complex types, I would say (aka some file formats are not suited for all scientific data). We intentionally do not try to re-invent file formats and a complex is literally just a value[2] or equivalently struct{ T r; T i}, so one has quite some leverage to be creative in implementations.

HDF5, ADIOS1, and ADIOS2 support at least complex<float> and complex<double> which is a very good coverage. And for JSON we just write them out as integral pair type.

Also with your example above do you have non h5py code that can read this in. I would be curious to see this (or how to write this using non-h5py code).

I am trying to draft this as an example implementation in openPMD-api right now. It's similar to how we (and h5py) already handle bools in HDF5: here one checks it's a compound type and the label names for the components are as defined above. (Note: h5py bools are by similar convention labeled enums.)

DavidSagan commented 4 years ago

@ax3l

We intentionally do not try to re-invent file formats and a complex is literally just a value[2] or equivalently struct{ T r; T i}, so one has quite some leverage to be creative in implementations.

Actually I would say that the purpose of any standard is to improve portability at the expense of creativity.

So how about this for the standard: 1) If a file format supports complex that is to be used. 2) If a file format does not support complex but supports structs then store complex as a struct with component labels "r" and "i". 3) If a file format does not support complex nor structs then store complex numbers in two sub-datasets labeled "r" and "i".

HDF5, ADIOS1, and ADIOS2 support at least complex and complex which is a very good coverage.

Hmmm. As far as I know HDF5 does not support complex numbers. Do you have a reference for this?

ax3l commented 4 years ago

Yes, there ways to express complex numbers in HDF5 which are broadly enough supported to be used (aka h5py). I'll show you an example implementation with the HDF5 C api in https://github.com/openPMD/openPMD-api/pull/639, but I will need a bit more time to work on this and just started yesterday. It's literally just the creation of a HDF5 compound type of two identical floats, with two labels r and i as shown here. Will look a bit like this. Update: HDF5 types added (look for isolated commit "HDF5: Complex Support").

Why not to specify fallbacks: it complicates a bit the parsing which makes auto-upgrades harder in the future and slows us down. We do not really have a format right now that does not support complex numbers nor a workflow that is blocked by using a non-scientific data format that would need that. Therefore, better leave fallbacks unstandardized, because we should not standardize things we do not already have concrete, working examples for. KISS principle.

DavidSagan commented 4 years ago

@ax3l

Actually it does not look like the h5py way is broadly supported. For example see: https://mathematica.stackexchange.com/questions/63983/hdf5-and-complex-numbers

HDF5 does not directly support complex numbers. Programs that do seem to be able to export complex numbers (like armadillo) to HDF5 will in reality split them into real and imaginary part and use their own non-standard convention for storing these. This means that while they can sometimes read back their own data, there is no compatibility between different software regarding storing complex values in HDF5.

So, to be portable, the standard needs to lay out some guidelines on how complex numbers are to be stored. How would you word this in the standard? How about something like: 1) If a file format supports complex that is to be used. 2) For HDF5, be compatible with h5py and store complex as a struct with component labels "r" and "i". 3) For other formats contact the openPMD maintainers. :-).

Another thing is I don't see how the fallbacks complicates parsing. Can you provide an example?

DavidSagan commented 4 years ago

@ax3l @ChristopherMayes

Another issue I thought of is that in the case where, say, the complex component is zero, If you follow the h5py way then you double the storage size needed as compared to using two sub-datasets. Is this penalty really worth it?

ax3l commented 4 years ago

How about something like: 1) If a file format supports complex that is to be used. 2) For HDF5, be compatible with h5py and store complex as a struct with component labels "r" and "i". 3) For other formats contact the openPMD maintainers. :-)

I think generally we can add a note for file formats. Such as "HDF5 bool and complex type representations must be h5py compatible."

For other formats contact the openPMD maintainers. :-).

I think that's generally the easiest way: for other formats, contact the reference implementation in openPMD-api and/or open a public issue.

Another thing is I don't see how the fallbacks complicates parsing. Can you provide an example?

If we implement sub-groups then we have to implement this sub-group parsing in all readers and upgrade scripts, which I try to avoid unless we have a strong use case.

Another issue I thought of is that in the case where, say, the complex component is zero, If you follow the h5py way then you double the storage size needed as compared to using two sub-datasets. Is this penalty really worth it?

I think there are several solutions to that without a pre-mature optimization that complicates the standard. First of all, if a user knows an openPMD record component is purely real (or imaginary), then they can take a real data type and name the component accordingly. Second, every light-weight compression that can be opaquely added on the file format layer (HDF5, ADIOS1/2) will strongly compress zeroed-out components. This falls under pre-mature optimization, I think.

ax3l commented 4 years ago

(Sorry, clicked wrong button.)

DavidSagan commented 4 years ago

@ax3l @ChristopherMayes

OK but let me write something first since I have a new proposal...

DavidSagan commented 4 years ago

@ax3l @ChristopherMayes

Upon further reflection I propose this for the standard:

Complex data types are to be stored in a group with datasets (or constant record components if the values are constant) labeled "re" for the real part and "im" for the imaginary part. Using "re" and "im" storage is mandated even if the storage format supports native complex numbers.

Motivation. The advantages are: 1) There is no degradation in portability. 2) Simplicity. 3) Using "re" and "im" simplifies programming since reading or writing complex data is equivalent to reading or writing two real data arrays. Since reading or writing real data must be implemented in any case, the extra work in implementing reading/writing complex is just a few lines (essentially two lines for reading and two lines for writing). 4) Data storage is minimized if all the real and/or imaginary parts are the same. 5) We don't have to worry about new formats and how to handle them. 6) No fallbacks to complicate things. 7) No worrying about if there is opaque storage size optimization of complex data types.

ax3l commented 4 years ago

As reasoned above, we do not have a file format that does not support complex in our production workflows. Due to the reasons above (maintenance, reference implementations, upgrades, etc.) I would not standardize this until someone has a concrete example workflow where this is needed.

You are of course welcome to provided PRs to the reference implementations (viewer, API, validator scripts, upgrader, VisIt, ...) with concrete challenges that cannot be addressed otherwise and we can move the priority up. But I do not see the need right now with the tools and file formats we are using. I think this is a file format task and we intentionally rely on file formats for such things, because it's the scalable and maintainable thing to do.

We can have a VC in new year if you like to talk more about this.

DavidSagan commented 4 years ago

As reasoned above, we do not have a file format that does not support complex in our production workflows.

This is not true. Let me quote again from https://mathematica.stackexchange.com/questions/63983/hdf5-and-complex-numbers

HDF5 does not directly support complex numbers. Programs that do seem to be able to export complex numbers (like armadillo) to HDF5 will in reality split them into real and imaginary part and use their own non-standard convention for storing these. This means that while they can sometimes read back their own data, there is no compatibility between different software regarding storing complex values in HDF5.

In terms of maintenance, reference implementations, upgrades, etc., my proposal is actually superior to using a struct since there is less code to write.

ChristopherMayes commented 4 years ago

I agree with the @DavidSagan 's post above for labeling 're' and 'im' datasets.

ax3l commented 4 years ago

Let's have a VC on this in the new year.

Adding our own conventions neither adds compatibility to armadillos nor other tools. Adopting a widely used convention such as h5py's adds native support to many tools and is not more complicated for all others adapters, we just document the used convention for hdf5 in our standard.

DavidSagan commented 4 years ago

@ax3l

Adopting a widely used convention such as h5py's

??? The quote I give above indicates that this is not true. What evidence do you have to the contrary?

... adds native support to many tools

What non-h5py based native tools are you thinking of?

and is not more complicated for all others adapters

Actually it is more complicated if you are using the standard C/C++/Fortran HDF5 interface. Not much more but with the convention I suggested it basically takes two lines to read in complex data. With the h5py convention I have to setup the struct which is definitely more work.

Also h5py does not do any compression at all for complex numbers. Even if all the numbers are zero.

ax3l commented 4 years ago

I am suggesting to continue this discussion in a VC after the holidays, with concrete examples and a simple compatibility matrix.

I heard your points, you can already in 1.* implement this for all scalar records without any change and I mentioned a minimally invasive, non-overhead alternative above. Creating a HDF5 compound type is a four-liner, and I need to see concrete workflows how you read data to understand more of your concrete needs after the break.

For example: are you using armadillo or is this just an example out of tons of Lin alg libraries that we pick here. Is armadillo more used than h5py? Our own convention instead of adopting one will add native support to none of those. Also, do you even need complex numbers for non-scalar records? All questions we have to consider.

The compression argument is technically wrong, hdf5 can compress on the byte level of any type. And so does h5py, just activate the flag on variable creation. I am happy to help with that when I am not in an airport and not on holidays.

DavidSagan commented 4 years ago

I am suggesting to continue this discussion in a VC after the holidays, with concrete examples and a simple compatibility matrix.

I agree and concrete examples are a good thing to clarify the discussion.

No I don't use armadillio. It was just part of the quote discussing compatibility.

Right now complex fields are stored in, for example, Bx, By, Bz scalar records. Perhaps this should be changed to B_field/x,y,z?

And if you could give me an example of using compression that would help.

ax3l commented 4 years ago

Meeting minutes from Sep 2, 10:30 AM PT

Action items:

Related question: show an HDF5 Fortran compression example for @DavidSagan. This affects write API logic only and will be automatically decompressed on reads, opaque to the user reading the data as usual. This question came up as an example for complex types if one already knows that the imaginary part is zero, e.g. for specific time steps of a simulation.

DavidSagan commented 4 years ago

@ax3l @ChristopherMayes OK I removed the paragraph in the Beam Physics extension on how to store complex numbers. The extension is now all set to become official.

ax3l commented 4 years ago

HDF5 code example to define a new, h5py-compatible complex float type (equivalent for double and long double): C++:

hid_t m_H5T_CFLOAT = H5Tcreate(H5T_COMPOUND, sizeof(float) * 2);
if(m_H5T_CFLOAT < 0)
    throw std::runtime_error("[HDF5] Internal error: Failed to create complex float");

H5Tinsert(m_H5T_CFLOAT, "r", 0, H5T_NATIVE_FLOAT);
H5Tinsert(m_H5T_CFLOAT, "i", sizeof(float), H5T_NATIVE_FLOAT);

// at this point, m_H5T_CFLOAT can be used like any other pre-defined HDF5 type for reading and writing
// use std::complex< float/double/long double> as C++ types

// testing a type before a read can use H5Tequal:
// - H5Tget_class(read_dataset_type) == H5T_COMPOUND &&
//   H5Tget_nmembers(read_dataset_type) == 2 ) &&
//   2x H5Tget_member_name
// ... or simpler:
// - read_dataset_type_id is a hid_t, too and needs to be read first via H5Dget_type
// if( H5Tequal(read_dataset_type, m_H5T_CFLOAT) ) ...

// before closing the HDF5 file, the type has to be freed again to avoid a resource leak
// using the type referenced by m_H5T_CFLOAT after H5Tclose is not allowed
herr_t status = H5Tclose(m_H5T_CFLOAT);
if( status < 0 )
    throw std::runtime_error("[HDF5] Internal error: Failed to close complex float type");

h5py example:

import h5py as h5
iport numpy as np

data = np.array([4.5 + 1.1j, 6.7 - 2.2j], dtype=np.complex64)
# complex64 is 2*float, complex128 is 2*double, clongdouble is 2*long double

# ... store to h5py as usual
# ... read will be returning a dtype=np.complex<N> numpy array automatically

Fortran:

     USE hdf5

     INTEGER     ::   hdferr ! HDF5 error flag
     INTEGER(HID_T) :: my_h5_complex_real      ! Compound datatype identifier for complex REAL
     INTEGER(SIZE_T)     ::   type_size  ! Size of the datatype
     INTEGER(SIZE_T)     ::   offset     ! Member's offset

     LOGICAL :: same_types           ! TRUE/FALSE flag to indicate if two datatypes are equal

! ... open the file as usual

     CALL h5tcreate_f(H5T_COMPOUND_F, SIZEOF(H5T_NATIVE_REAL)*2, my_h5_complex_real, hdferr)
     offset = 0
     CALL h5tinsert_f(my_h5_complex_real, "r", offset, H5T_NATIVE_REAL, hdferr)
     offset = SIZEOF(H5T_NATIVE_REAL)
     CALL h5tinsert_f(my_h5_complex_real, "i", offset, H5T_NATIVE_REAL, hdferr)

! ... use my_h5_complex_real just like pre-defined HDF5 types for write and read, check error codes as usual
! use COMPLEX as data types

! probing prior to a read: testing if a type is of complex real
! - HDF5 type check if this is a H5T_COMPOUND_F type
! - HDF5 compound check for labels "r" and "i"
! This can be simplified by calling h5tequal_f:
! - read_dataset_type_id is an INTEGER(HID_T), too and needs to be read first via h5dget_type_f
!     CALL h5tequal_f(read_dataset_type_id, my_h5_complex_real, same_types, hdferr)

! before closing the HDF5 file, the type has to be freed again to avoid a resource leak
! using the type referenced by my_h5_complex_real after H5Tclose is not allowed
     CALL h5tclose_f(my_h5_complex_real, hdferr)

! ... close file
DavidSagan commented 4 years ago

@ax3l Could you expand the above example to include reading and writing? Thanks.

ax3l commented 4 years ago

Offline discussed: yes. @DavidSagan Can you please provide a minimal read/write example that just writes/reads a simple scalar field on which I can extent on? (Sorry for my weak Fortran.)

DavidSagan commented 4 years ago

@ax3l Examples posted at Slack...

ax3l commented 4 years ago

We made it :tada: This is a self-contained example based on https://github.com/mokus0/hdf5/blob/master/fortran/examples/compound_fortran2003.f90:

! This is the F2003 version of the h5_compound.c example source code.
! * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
! Copyright by the Board of Trustees of the University of Illinois.         *
! All rights reserved.                                                      *
!                                                                           *
! This file is part of HDF5.  The full HDF5 copyright notice, including     *
! terms governing use, modification, and redistribution, is contained in    *
! the files COPYING and Copyright.html.  COPYING can be found at the root   *
! of the source code distribution tree; Copyright.html can be found at the  *
! root level of an installed copy of the electronic HDF5 document set and   *
! is linked from the top-level documents page.  It can also be found at     *
! http://hdf.ncsa.uiuc.edu/HDF5/doc/Copyright.html.  If you do not have     *
! access to either file, you may request a copy from hdfhelp@ncsa.uiuc.edu. *
! * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
!
! This example shows how to create an array of a compound datatype which 
! contains an array of type complex and how to write it to hdf5 
! and how to read it back into a compound datatype for hdf5.
!

PROGRAM compound_complex_fortran2003

  USE hdf5
  USE ISO_C_BINDING
  IMPLICIT NONE

  INTEGER, PARAMETER :: r_k8 = KIND(0.0d0)
  INTEGER, PARAMETER :: NMAX = 3

  INTEGER(HID_T)   :: sample_type_id, dset_id, dspace_id, file_id
  INTEGER(HSIZE_T) :: dims(1) = (/NMAX/)
  INTEGER :: error

  COMPLEX(KIND=r_k8), DIMENSION(1:NMAX), TARGET :: samples, read_samples
  INTEGER :: i

  TYPE(C_PTR) :: f_ptr
  INTEGER(8) :: real_size, real_complex_size

  real_size = storage_size(1_r_k8, r_k8) / 8
  real_complex_size = real_size * 2_8  ! a complex is (real,real)

  ! Initialize data
  DO i=1,NMAX
     samples(1:NMAX) = (3.14159_r_k8, 2.71828_r_k8)
  END DO

  ! Initialize FORTRAN interface.
  CALL h5open_f(error)

  ! Create a new file using default properties.
  CALL h5fcreate_f("test.h5", H5F_ACC_TRUNC_F, file_id, error)
  !
  ! Create the memory data type.
  !
  CALL H5Tcreate_f(H5T_COMPOUND_F, real_complex_size, sample_type_id, error)

  ! Create the array type
  ! Then use that array type to insert values into
  CALL H5Tinsert_f( sample_type_id, "r", &
       0_8, h5kind_to_type(r_k8,H5_REAL_KIND), error)
  CALL H5Tinsert_f( sample_type_id, "i", &
       real_size, h5kind_to_type(r_k8,H5_REAL_KIND), error)
  !
  ! Create dataspace
  !
  CALL h5screate_simple_f(1, dims, dspace_id, error)
  !
  ! Create the dataset.
  !
  CALL H5Dcreate_f(file_id, "samples",  sample_type_id, dspace_id, dset_id, error)
  !
  ! Write data to the dataset
  !
  f_ptr = C_LOC(samples(1))
  CALL H5Dwrite_f(dset_id, sample_type_id, f_ptr, error)
  ! Close up
  CALL h5dclose_f(dset_id, error)
  CALL h5sclose_f(dspace_id, error)
  CALL H5Tclose_f(sample_type_id, error)
  CALL h5fclose_f(file_id, error)
  !
  ! Open the file and the dataset.
  !
  CALL H5Fopen_f("test.h5", H5F_ACC_RDONLY_F, file_id, error)

  CALL H5Dopen_f(file_id, "samples", dset_id, error)
  !
  ! Create the memory data type.
  !
  CALL H5Tcreate_f(H5T_COMPOUND_F, 16_8, sample_type_id,error)

  CALL H5Tinsert_f( sample_type_id, "r", &
       0_8, h5kind_to_type(r_k8,H5_REAL_KIND), error)
  CALL H5Tinsert_f( sample_type_id, "i", &
       real_size, h5kind_to_type(r_k8,H5_REAL_KIND), error)

  f_ptr = C_LOC(read_samples(1))
  CALL H5Dread_f(dset_id, sample_type_id, f_ptr, error)

  !
  ! Display the fields
  !
  DO i=1,NMAX
     WRITE(*,'(A,3(" (",F8.5,",",F8.5,")"))') "SAMPLES =",read_samples(1:NMAX)
  END DO

  CALL H5Tclose_f(sample_type_id, error)
  CALL H5Dclose_f(dset_id, error)
  CALL H5Fclose_f(file_id, error)

END PROGRAM compound_complex_fortran2003

Output analysis:

$ h5ls -r -v test.h5
Opened "test.h5" with sec2 driver.
/                        Group
    Location:  1:96
    Links:     1
/samples                 Dataset {3/3}
    Location:  1:800
    Links:     1
    Storage:   48 logical bytes, 48 allocated bytes, 100.00% utilization
    Type:      struct {
                   "r"                +0    native double
                   "i"                +8    native double
               } 16 bytes

$ h5dump test.h5 
HDF5 "test.h5" {
GROUP "/" {
   DATASET "samples" {
      DATATYPE  H5T_COMPOUND {
         H5T_IEEE_F64LE "r";
         H5T_IEEE_F64LE "i";
      }
      DATASPACE  SIMPLE { ( 3 ) / ( 3 ) }
      DATA {
      (0): {
            3.14159,
            2.71828
         },
      (1): {
            3.14159,
            2.71828
         },
      (2): {
            3.14159,
            2.71828
         }
      }
   }
}
}

And with h5py:

import h5py as h5
f = h5.File("test.h5", "r")
f['samples'].dtype
# dtype('complex128')
f['samples'][()]
# array([3.14159+2.71828j, 3.14159+2.71828j, 3.14159+2.71828j])
zerothi commented 3 years ago

I have now read all your suggestions.

I have battled this myself and really find a complex data-type useful and required.

Some comments to @DavidSagan comment:

@ax3l @ChristopherMayes

Upon further reflection I propose this for the standard:

Complex data types are to be stored in a group with datasets (or constant record components if the values are constant) labeled "re" for the real part and "im" for the imaginary part. Using "re" and "im" storage is mandated even if the storage format supports native complex numbers.

Motivation. The advantages are:

  1. There is no degradation in portability.
  2. Simplicity.
  3. Using "re" and "im" simplifies programming since reading or writing complex data is equivalent to reading or writing two real data arrays. Since reading or writing real data must be implemented in any case, the extra work in implementing reading/writing complex is just a few lines (essentially two lines for reading and two lines for writing).
  4. Data storage is minimized if all the real and/or imaginary parts are the same.
  5. We don't have to worry about new formats and how to handle them.
  6. No fallbacks to complicate things.
  7. No worrying about if there is opaque storage size optimization of complex data types.

The above comments does not mention whether they are stored in as [(re, im), (re, im)] in a compound type, or as separate variables. This is important for performance critical read/writes. The following comments suggests a compound type which I find the best option since the data is consecutive.

For all C/C++/Fortran one may optionally cast to the corresponding real variable (of twice the size) to accomplish this. Then attributes/compound information should trivially handle the complex specification.

I would be really positive if this found its way into the standard.