msgpack / msgpack-c

MessagePack implementation for C and C++ / msgpack.org[C/C++]
Other
3.03k stars 883 forks source link

pack double and float in a more compact way #1017

Closed GeorgFritze closed 2 years ago

GeorgFritze commented 2 years ago

As the spec states:

If an object can be represented in multiple possible output formats, serializers SHOULD use the format which represents the data in the smallest number of bytes.

should double and float be packed in the smallest way possible. Currently double and floats are packed in full size. Even if they store the value 0. The following change would "fix" this issue for double.

template <typename Stream>
inline packer<Stream>& packer<Stream>::pack_double(double d)
{
    if (d == int64_t(d))
    {
        pack_imp_int64(int64_t(d));
        return *this;
    }

    union { double f; uint64_t i; } mem;
    mem.f = d;
    char buf[9];
    buf[0] = static_cast<char>(0xcbu);

#if defined(TARGET_OS_IPHONE)
    // ok
#elif defined(__arm__) && !(__ARM_EABI__) // arm-oabi
    // https://github.com/msgpack/msgpack-perl/pull/1
    mem.i = (mem.i & 0xFFFFFFFFUL) << 32UL | (mem.i >> 32UL);
#endif
    _msgpack_store64(&buf[1], mem.i);
    append_buffer(buf, 9);
    return *this;
}

1018

redboltz commented 2 years ago

I'm not sure "the smallest number of bytes" means "the smallest number of bytes across all format families". I personally think that 1) choose output format correspondint to the source type, and then choose the smallest number of bytes one in the chosen output format.

There is a related issue https://github.com/msgpack/msgpack/issues/186

However, it is just my opinion.

In your expansion (acrossing output format) is potentially dangerous. Here is an example of unexpected behavior.

https://godbolt.org/z/1dYq1G1Yr

#include <cassert>
#include <cstdint>

int main() {
    double d1 = 0.9;
    assert(d1 != int64_t(d1));

    double d2 = 1;
    assert(d2 == int64_t(d2));

    double d3 = 0.99999999999999999;
    assert(d3 != int64_t(d3)); // failed
}
GeorgFritze commented 2 years ago

Regarding your first statement, I agree that the specification is not clear at this point. In practice, in my experience, often only simple numbers are stored in double. Especially when working with colleagues who are used to Matlab or similar.

Regarding your example, I don't see any danger of unexpected behaviour. The number 0.999999999999999 cannot be represented in double and, as can be seen in the assembler, is also stored exactly as the number 1.

https://godbolt.org/z/1vc3dW3sY

#include <cassert>
#include <cstdint>

int main() {
    double d1 = 0.9;
    assert(d1 != int64_t(d1));

    double d2 = 1;
    assert(d2 == int64_t(d2));

    double d3 = 0.99999999999999999;
    assert(d3 == int64_t(d3));
    assert(d3 == d2);

    double d4 = 0.9999999999999999;
    assert(d4 != int64_t(d4));
}
redboltz commented 2 years ago

You are right. My example is bad.

More questions.

  1. A double value that has all zero after the decimal point can be bigger than std::numeric_limits<std::uint64_t>::max(). In this case packed as float fomat family not int format family. Is this right? example code : https://godbolt.org/z/1qrndb5v7 I feel a bit inconsistent flavor. But it satisfies "the format which represents the data in the smallest number of bytes."
  2. Could you tell me how C++ language spec defines out of range value casting? I need to confirm the cast works expectedly by by spec, not by chance.
  3. C++ has not only double but also float. Maybe float packing requires similar approach. What do you think?
  4. msgpack format has uint 64 and int 64. How to treat negative numbers ?
GeorgFritze commented 2 years ago
  1. You are correct that you can represent larger and smaller values with double than you can with int.
  2. The C++ spec says that conversion from float to int is not defined if the value can not be represented. In all compilers i have found the conversion from to large or to small numbers of double result in the smallest value of the target type. (uint64_t => 0, int64_t => -9223372036854775808)

https://github.com/cplusplus/draft/releases/tag/n4910

7.3.11 Floating-integral conversions [conv.fpint] 1 A prvalue of a floating-point type can be converted to a prvalue of an integer type. The conversion truncates; that is, the fractional part is discarded. The behavior is undefined if the truncated value cannot be represented in the destination type. [Note 1 : If the destination type is bool, see 7.3.15. — end note]

  1. Yes of course the same "fix" should work for float.
  2. You are right that positive and negative numbers should be treated.

The somewhat better or more correct implementation for double would be:

#include <cmath> // for std::isnan
#include <limits> // avoid undefined behaviour

template <typename Stream>
inline packer<Stream>& packer<Stream>::pack_double(double d)
{
    if (~std::isnan(d))
    {
        if (d > 0 && d <= std::numeric_limits<std::uint64_t>::max() && d == std::uint64_t(d))
        {
            pack_imp_uint64(std::uint64_t(d));
            return *this;

        }else if (d >= std::numeric_limits<std::int64_t>::min() && d == std::int64_t(d))
        {
            pack_imp_int64(std::int64_t(d));
            return *this;
        }
    }

    union { double f; uint64_t i; } mem;
    mem.f = d;
    char buf[9];
    buf[0] = static_cast<char>(0xcbu);

#if defined(TARGET_OS_IPHONE)
    // ok
#elif defined(__arm__) && !(__ARM_EABI__) // arm-oabi
    // https://github.com/msgpack/msgpack-perl/pull/1
    mem.i = (mem.i & 0xFFFFFFFFUL) << 32UL | (mem.i >> 32UL);
#endif
    _msgpack_store64(&buf[1], mem.i);
    append_buffer(buf, 9);
    return *this;
}

P.s.: i probably won't have an internet connection for the next two weeks so i might not be able to answer your questions.

redboltz commented 2 years ago

Thanks you for checking. Satisifying C++ standard is important. Even if all current compilers works as we expected, we shouldn't depend on undefined behavior by spec. So your updated code example looks nice. Checking Nan and limits are good.

It seems that std::isnan() has been introduced since C++11. msgpack-c needs to support C++03. https://en.cppreference.com/w/cpp/numeric/math/isnan Note: C version is since C99. msgpack-c needs to support C++03 and ANSI-C. https://en.cppreference.com/w/c/numeric/math/isnan

I'm not sure what is the best solution.

It is just an idea: Somehow checking the environment and dispatching implementation seems good.

If isnan() is not defined, use the following implementation.

bool is_nan(double x) { return x != x; }

Or environment cheking is difficult, simple use the implementation above.


It is different topic. isnan returns bool. So

    if (~std::isnan(d))

should be

    if (!std::isnan(d))

I prefer

if (d >= 0) else ...

to

if (d > 0) else ...

If you don't have special reason, please use >= (dispatching 0 to pack_imp_uint64())`.


When you update the PR, use the same coding rule in the file.

For example:

if(condition1) {
    dosome;
} else if(condition2) {
}
GeorgFritze commented 2 years ago

You are correct with the std::isnan implementation. I prefer

  if (d == d) // check for nan 

Sorry for the ~= wrong not operator, was reviewing to much Matlab code lately.

I updated the PR to include your fixes and use the coding rules. Thanks for the help to fulfil more standard compliant code.

1018

Here an example of the used code in the PR:

template <typename Stream>
inline packer<Stream>& packer<Stream>::pack_double(double d)
{
    if (d == d) { // check for nan 
        // compare d to limits::max() to avoid undefined behaviour
        if (d >= 0 && d <= std::numeric_limits<uint64_t>::max() && d == uint64_t(d)) {
            pack_imp_uint64(uint64_t(d));
            return *this;

        } else if (d >= std::numeric_limits<int64_t>::min() && d == int64_t(d)) {
            pack_imp_int64(int64_t(d));
            return *this;
        }
    }

    union { double f; uint64_t i; } mem;
    mem.f = d;
    char buf[9];
    buf[0] = static_cast<char>(0xcbu);

#if defined(TARGET_OS_IPHONE)
    // ok
#elif defined(__arm__) && !(__ARM_EABI__) // arm-oabi
    // https://github.com/msgpack/msgpack-perl/pull/1
    mem.i = (mem.i & 0xFFFFFFFFUL) << 32UL | (mem.i >> 32UL);
#endif
    _msgpack_store64(&buf[1], mem.i);
    append_buffer(buf, 9);
    return *this;
}