Open annamalai-ps opened 2 years ago
What is happening here
You have written a dataset with a vector containing one empty string
> bpls planewave_08.it00000000.bp -al
…
string /data/0/patchSuffixes attr = {""}
…
In reading, the openPMD-api does not distinguish between single-element vectors and scalar values in ADIOS2 attributes, so it sees this as a single empty string. Empty strings however are disallowed because they are not supported by our HDF5 backend and we want to keep things compatible with all backends.
Possible workaround
We could disable this kind of check while reading and restrict the check to writing. However, I'm not sure if HDF5 actually supports empty strings as elements of vectors and we should additionally safeguard that case. Do you know more here @ax3l?
For now, #1223 implements this workaround and should make things work for you again. With this fix, I can read your dataset without issue.
Better fix
In the output above, bpls
is somehow able to distinguish single-element vectors from scalar values. I don't see a way to distinguish both in ADIOS2, but apparently it's possible somehow. If we had that, we could use it to avoid the situation altogether.
@franzpoeschel See is_value
to distinguish between scalars ("values") and arrays.
I am also surprised that HDF5 wouldn't support empty strings. I was able to create an HDF5 file with an empty string attribute just fine:
$ h5dump /tmp/emptystring.h5
HDF5 "/tmp/emptystring.h5" {
GROUP "/" {
ATTRIBUTE "string" {
DATATYPE H5T_STRING {
STRSIZE 1;
STRPAD H5T_STR_NULLTERM;
CSET H5T_CSET_UTF8;
CTYPE H5T_C_S1;
}
DATASPACE SCALAR
DATA {
(0): ""
}
}
}
}
@franzpoeschel See
is_value
to distinguish between scalars ("values") and arrays.
So the functionality definitely exists, but it seems like it is currently not exposed in the C++ bindings (your link is from the C bindings). I've already contacted Norbert about this, maybe he sees something that I don't.
I am also surprised that HDF5 wouldn't support empty strings. I was able to create an HDF5 file with an empty string attribute just fine:
$ h5dump /tmp/emptystring.h5 HDF5 "/tmp/emptystring.h5" { GROUP "/" { ATTRIBUTE "string" { DATATYPE H5T_STRING { STRSIZE 1; STRPAD H5T_STR_NULLTERM; CSET H5T_CSET_UTF8; CTYPE H5T_C_S1; } DATASPACE SCALAR DATA { (0): "" } } } }
To be fair, the HDF5 backend was written before I started working on openPMD, so Axel will know more about the details here probably
Thanks for investigating this and the first work-around in #1223 :+1:
I think the reason we did not support this is because a couple of the involved C calls in both read and write need extra careful handling if used with empty strings, e.g., for vlen strings: #1084 That said, we officially in openPMD only support constant size strings anyway.
For instance, there are calls like H5Tset_size
that behave oddly with zero strings #979 in some situations, and I am kinda don't like to track corner-cases of HDF5 releases over time to support features like storing an empty string (why at all, one might ask).
Just to close the loop for writing as well: Do you have a good use case why we should in write not error out for empty strings? This feels a bit like if we add support for it, then we might loose a lot of time debugging it for old releases of various backends & platforms, etc. triage it with upstream, add extra code paths for work-arounds, and I don't immediately see a use case yet. I am ok-ish to add it in read.
I checked, and neither Python nor Julia have any issues reading or writing empty string attributes in HDF5. It should be possible to continue to support writing empty strings, even for HDF5 files. The error reported above was for an ADIOS2 file and was thus unrelated to HDF5.
My case for empty string is: It is always good to support empty containers, and this is often a natural way of expressing certain conditions. Programming languages support empty strings. The file mentioned above contains an empty string because that was a natural choice in our writer. It is, of course, always possible to avoid empty strings, but this can be nontrivial.
The file above contains an array of strings with one array element per patch. Avoiding empty strings in an array either requires remapping array indices (that would add much complexity), or adding a sentinel value to each string, or to empty strings. The reader then needs to look for and handle these sentinel values. Paying respect to Fortran, one could replace empty strings with a string that contains only a single space
, but it would be much easier if that was handled by a lower level and not the application.
I see, thanks for the detailed example. We usually solves this in openPMD keywords by explicitly having none
keywords defined, but I agree this is generally useful.
If we add sufficent testing for it in a PR, then I think it is worth adding support for it. We should check in the tests that ADIOS2 and HDF5 backends handle this well. If one of them has issues I would ping upstream for fixed as well, so the two have feature partity.
Let's continue this in a new issue for tracking?
Hi! I have been using openpmd-api for a while and I did not have any issues reading .bp files with openPMD-api in Python. Recently I am facing an issue with reading bp files. When I execute:
I get the following error:
RuntimeError: [setAttribute] Value for string attribute 'patchSuffixes' must not be empty!
I am using openpmd-api 0.14.4 and I have attached the bp file.
planewave_08.it00000000.zip