Extended precision floating point support

chuckatkins commented 4 years ago

This is to track the issues surrounding support for extended precision float, i.e. long double. Some things to consider:

The multitude of precision and byte representations to support.
Cross-language support (Fortran REAL(16) may or may not be supported at all and if it is, may not be the same as the corresponding C / C++ long double)
Padding issues on non-dense representations (i.e. 80-bit long double taking up 128 bits of space)
Read support for formats not supported by the current architecture.

chuckatkins commented 4 years ago

See discussions in #1907 and #1921 for

KyleFromKitware commented 4 years ago

I am currently playing around with ways to detect the bit representation of long double using try_compile(). I think we may actually be able to get that into CMake upstream (and backport it here.)

chuckatkins commented 4 years ago

@ax3l @germasch

germasch commented 4 years ago

I can think of a number of questions that should be answered:

Which extended floating points do you want to support in the API?
How do you handle padding within BP3/4 buffers?
How to handle backwards compatibility?

So I think the types that adios2 should support on any given system are:

long double (always exists, though it could just be the same as double) and
maybe: __float128. (may not exist). The other question, though, is whether there is demand to support in the first place.

I guess it's a choice, but supporting a floating point type which isn't natively supported by the compiler / platform would be quite a pain and I don't see the usefulness -- who would want to read data which they then can't process?

So on the external API side, you'd want to add long double and maybe __float128. Internally, you want to support those same types, too, though the internal types need to unique, otherwise you'll get duplicate template instantiations. In particular, one needs to be careful when std::is_same<long double, double>, which I believe is the case on MSVC. Obviously that's easy to test in C++, but I think it actually needs to be known at the preprocessor level because of the ubiquitous use of the FOREACH macros.

I don't see that the binary representation that long double to corresponds to necessarily needs to be known at compile time (other than != float64). It'd be easy to determine it at runtime, and primarily it should only be needed to determine which datatype to encode in the BP3/4/whatever file format. One would need to know about the existence of __float128 at compile time if that's to be supported.

The other question is what to do with the padding. Currently, adios2 essentially does memcpys of the buffer into BP buffers, which would copy the padding as well. As has already turned up here, that can lead to spurious warnings about uninitialized memory, and potentially even leaking of sensitive information. The other disadvantage would be that it wouldn't be possible to read long double written on 32-bit Linux back on 64-bit Linux or vice versa, even though both are 80-bit float, but with different padding. (Though I doubt there's a lot of use of adios2 on 32-bit Linux.)

The alternative would be to rearrange the data to drop the padding before putting things into the BP buffer, and adding it back when reading. The disadvantage to that is that it requires, essentially, a memory copy, and eliminates the possibility of doing anything zero-copy. While touching the entire buffer to eliminate padding is not exactly cheap, I suppose it should be quite a bit cheaper than sending all the useless padding data to disk or over the network, as that bandwidth is much less than memory bandwidth.

Finally, there's a question about what to do when reading existing BP3/4 files, which don't encode the details of the actual extended floating point that was written. I suppose if one reads a file on a given architecture, one would want to assume that the long double representation is the same as the one currently reading it. One might at least want to be careful to avoid buffer overflows, ie., a sanity check of whether sizeof(long double) == sizeof(type in file)`. I'm not sure that it's possible to recover the size of the type that was written from BP3/4, but it may be.

The other aspect to this is that going forward one should encode, say, "R80_96" or "R80_128" or whatever as distinct types into the BP3 file, which means that new type ids will need to be introduced, and old readers won't know how to handle them.

eisenhauer commented 4 years ago

I guess it's a choice, but supporting a floating point type which isn't natively supported by the compiler / platform would be quite a pain and I don't see the usefulness -- who would want to read data which they then can't process?

The answer there is that you may well want to read data in a type that isn't natively supported if ADIOS had the ability to convert it to a type that was supported. ADIOS historically has had a pretty rigid type system (not unjustifiably for a middleware focused solely on performance), but a more flexible system might allow you to read a 32-bit float into a 64-bit, or vice versa, if that suited the application's need. It's non-trivial to implement, and if you're really converting from an unsupported type it's going to be costly, but there is usefulness...

germasch commented 4 years ago

The answer there is that you may well want to read data in a type that isn't natively supported if ADIOS had the ability to convert it to a type that was supported. ADIOS historically has had a pretty rigid type system (not unjustifiably for a middleware focused solely on performance), but a more flexible system might allow you to read a 32-bit float into a 64-bit, or vice versa, if that suited the application's need. It's non-trivial to implement, and if you're really converting from an unsupported type it's going to be costly, but there is usefulness...

Right, it would of course be useful. I came at this looking at adios2's existing design, where it looks like a choice was made to not support conversions (other than endianness). I'm sure there are situations where it's helpful being able to have the data converted while reading/writing data (like HDF5 does, I believe, though I don't think HDF5 handles extended floating point conversions). There may be even more of a point if support for something like float16 is added, which I don't think has widespread support on CPUs, but can be useful with GPUs.

However, if you wanted to add conversions, I think that'll require some thinking on what the API should look like. Right now, the data type is templated into, e.g., adios2::Variable, so InquireVariable(...) will fail if that dataset is using double. That behavior could be updated, but then the next question would if you wanted to support writing, say double data in memory into a float dataset on disk, and what the API would be like. An alternative could be to say the template type is the type on disk, but Get/Put would get overloads that would allow passing data of a different type, which would be converted. So there are certainly options, but doing this as a general feature addition is not a trivial project.

pnorbert commented 4 years ago

While touching the entire buffer to eliminate padding is not exactly cheap, I suppose it should be quite a bit cheaper than sending all the useless padding data to disk or over the network, as that bandwidth is much less than memory bandwidth.

I bet it's quite the contrary. In my experience, processing the data will be slower than writing the original buffer to disk. That's why compression is not good for speeding up I/O in general.

So while we may want to do a flexible way doing buffering, we also need the option to just do it fast. After all, the main consumer requesting this feature cares much about the performance.

pnorbert commented 4 years ago

Note: recent HDF5 tutorial at ECP, slides on best practices: first bullet is Do large chunks of IO, second bullet is Avoid datatype conversion.

germasch commented 4 years ago

I bet it's quite the contrary. In my experience, processing the data will be slower than writing the original buffer to disk. That's why compression is not good for speeding up I/O in general.

Well, I said "I suppose i should be" because I haven't measured it, so I don't know. Note, though, that generally BP3 copies the user provided buffer to the BP3 buffer anyway, so there is already a copy touching both buffers. The access pattern would be slightly more irregular when skipping the padding, but still, it's essentially streaming read/write, so it should perform kinda comparably at that stage, and lead to smaller buffers to deal with later on (only 80 bits out of 128 would be kept in the common case.)

Nevertheless, even just dropping the padding is deviation from the overall current design, which mostly considers buffers to be just bits without caring what they represent (with the exception of endianness), and a complication. I think there are pros and cons for going either way, and I personally don't have a strong preference.

Note that none of this should ever affect the common case, ie., no-conversion no-padding access would not see any performance impacts.

So while we may want to do a flexible way doing buffering, we also need the option to just do it fast. After all, the main consumer requesting this feature cares much about the performance.

eisenhauer commented 4 years ago

Datatype conversion won't be on anyone's fast path, and if you look at how FFS works, you'll know I believe that it shouldn't be done on the writer side. But it's not like nobody ever moves data from machine to machine or tries to read archival data from unavailable systems. That byte-endian conversion is possible on read is a nod to that necessity, but it's only a small step.

chuckatkins commented 4 years ago

w.r.t. conversion, we certainly shouldn't make it implicit. Even if it's explicit, the ability to convert between natively supported types is an orthogonal feature that would be independent of the extended precision support (although certainly useful for it).

I've also been thinking about "unsupported" types and format. My initial reaction was to simply only support whatever format is native and throw an exception otherwise. i.e. if a binary128 float variable is written natively on ppc64le, then an exception would get thrown when trying to read it on x86_64. Thinking more about it though, I'm not so sure that's necessary. We're not actually manipulating the data or even really doing anything with it so why couldn't you read it back? Certainly trying to write <long double> and read <long double> should throw an exception if those types are different, but if the mapped adios types are used then why shouldn't it work? Operators certainly wouldn't be able to work on them, other than that though, I'm not seeing on the surface why it wouldn't just "work".

Consider the case for <size_t>. Writing and reading it on two systems that have different representations for it already throws an exception since adios determines the underlying fixed width integer type and uses that. But if, on the read side, you explicitly use the fixed width integer type it maps to then it should work fine.

germasch commented 4 years ago

w.r.t. conversion, we certainly shouldn't make it implicit. Even if it's explicit, the ability to convert between natively supported types is an orthogonal feature that would be independent of the extended precision support (although certainly useful for it).

I think we agree that it's an orthogonal issue in general. (However, when it comes to, e.g., reading an R80_96 float into an R80_128 float, where the latter is long double, I'm not sure how you'd want to explicitly express that what you're reading into your long double should be converted.)

I've also been thinking about "unsupported" types and format. My initial reaction was to simply only support whatever format is native and throw an exception otherwise. i.e. if a binary128 float variable is written natively on ppc64le, then an exception would get thrown when trying to read it on x86_64. Thinking more about it though, I'm not so sure that's necessary. We're not actually manipulating the data or even really doing anything with it so why couldn't you read it back?

I'm not sure I understand what you mean. Exception would have been my suggested answer. Say on ppc64le long double ==__Ibm128 was written. Now you're reading it on x86_64 into a long double, which is 80 bits padded to 128 bits. The options I see are:

(1) throw exception (probably at InquireVariable<long double> already.) (2) just read it as a binary blob (3) convert __ibm128 to 80-bit extended precision.

Are you saying to do (2)? That's essentially what happens today. In this example case it's safe, but on other systems where long double is 64-bit or 96-bit, it may lead to buffer overflows. Of course one could just support the case where sizeof(long double) equals the size of what's in the file, otherwise throw an exception.

Even in the cases where (2) is safe, I don't like it, as it's surprising behavior -- adios2 will read the data without any indication that something's amiss, but once you use it, it'll not make sense. E.g., bpls will show nonsense numbers. In addition, if you do (2) now, it'll be hard to do (3) later without having to think about not breaking compatibility.

My preference is (1), which is simplest to do, and keeps the option of adding conversion (3) later, or even doing (2) later if somehow that's what people need.

Certainly trying to write <long double> and read <long double> should throw an exception if those types are different, but if the mapped adios types are used then why shouldn't it work?

I don't think I get what you're saying here?

chuckatkins commented 4 years ago

Essentially, while the user may specify <long double>, internally ADIOS wouldn't actually use long double, it would map it to a fixed unambiguous type: adios2::real64, adios2::real128, adios2::real128_dd,adios2::real80_96 or adios2::real80_128.

So let's take the example of writing a <long double> variable on windows with msvc and reading it back on linux x86_64. ADIOS would actually write a <adios2::real64> variable since that's what the type actually maps to. Then on the read side, trying to read it as <long double> would throw an exception because that's not the actual type. However, trying to read it as <adios2::real64> should succeed.

Similarly, writing <long double> on ppc64le would actually write <adios2::real128> (unless you use the old double-double format, in which case it would be <adiso2::real128_dd>. On the linux x86_64 read side, trying to read the variable as <long double> would throw an exception because it maps to <adios2::real80_128>. But, you could read it as <adios2::real128> as simply a binary blob.

eisenhauer commented 4 years ago

Generally, I support a path to conversion. I don't worry much about whether or not it's implicit. If there's concern that people will be reading non-native data a lot and not know why it's slow, print a warning message. I'd be much more worried about people trying to read non-native data somewhere and the only answer ADIOS has is that "it can't be done"... (Or maybe: "Find a machine that natively supports both the source and destination datatypes and write a program that does the conversion, then bring the data here to read".)

chuckatkins commented 4 years ago

For implementing it, the <long double> type should be relegated to only exist in the bindings and everything in core would directly use the adios2::real* types. The adios2::TypeInfo<T> struct would be extended with the floating point types such that adios2::TypeInfo<long double>::IOType would reflect the underlying type used for IO. This is similar to how size_t is handled in that adios2::TypeInfo<size_t>::IOType reflects the associated fixed width integer type.

germasch commented 4 years ago

Essentially, while the user may specify <long double>, internally ADIOS wouldn't actually use long double, it would map it to a fixed unambiguous type: adios2::real64, adios2::real128, adios2::real128_dd,adios2::real80_96 or adios2::real80_128.

Okay, that makes perfect sense to me now (it's essentially what I've been thinking to do as well). Obviously, if the internal types match, there's not question that one should be able to read and write them without limitations.

What I hadn't thought about doing is providing non-native types on a given system. But I suppose you can do using real128_dd = std::array<char, 16> or something of that sort -- I would have only added using real128[_dd] = long double on ppc, but not even provided it other systems. The disadvantage of providing them all is some code bloat in terms of library size since the FOR_EACH macros will instantiate almost of all adios2 for every (internal) type. But that's probably not a major issue. (Also, it could be fixed by using a backported std::variant to avoid a lot of duplication in the first place.)

germasch commented 4 years ago

And on the by-the-way note, in terms of naming I'd prefer adios2::float64 over adios2::real64 etc, which matches what numpy does and is generally more in the spirit of C/C++ "floating point" types, rather than Fortran's real. But others may differ ;)

ax3l commented 4 years ago

Just to make my intent clear: extended precision support (long double) is nice, but we should aim for support for half-precision types at the same time, as those tyoes are already used in practice for GPU codes and might need to go into data analysis and checkpoint-restart workflows.

Re.:

Just to update this issue: I guess the general goal should be multi-precision float support, especially for half-precision types (bfloat & float16) which already landed in the development branch of e.g. OpenMPI and are actively used in math libraries and apps already. Support for long double is just a very similar problem.

ornladios / ADIOS2

Extended precision floating point support #1972