Proposal: support little-endian (strongly recommended)

msgpack / msgpack

MessagePack is an extremely efficient object serialization library. It's like JSON, but very fast and small.

http://msgpack.org/

7.02k stars 521 forks source link

Proposal: support little-endian (strongly recommended) #313

Closed dukelec closed 3 years ago

dukelec commented 3 years ago

First of all, I've already read #172, but I still think little-endian support is very important for msgpack.

The main reason why the network uses big-endian is that CPUs used to be big-endian a long time ago, and big-endian may be a tiny more intuitive for humans to read. But for machines and programmers, little-endian is more convenient (e.g. pointer casting), so all mainstream CPUs are now little-endian, whether they are PCs, mobile phones or IoT devices.

For TCP/IP, big-endian requires minimal conversion overhead, as a lot of the data transferred over the network is audio, video and files, and the IP header account for a negligible proportion of the entire packet. Moreover, network devices such as switches are dedicated hardware for parsing packets and do not require endian conversion.

However, for msgpack, the data that needs to be converted between different endian often makes up the majority of the entire packet, so you cannot simply use TCP/IP as a reference.

It would be great if msgpack could be widely used for IoT, and it would be much easier and more efficient if little-endian was supported. Many of the networks used by MCUs are little-endian, such as CANopen, CDNET, etc.

Here is my suggested way to add little-endian support to msgpack: add a little-endian configuration option to mainstream msgpack libraries without modifying the formats table of the specification, so that users can configure and select the specified endian they want. (The description of endianness in the specification needs to be changed.)

These libraries can keep big-endian as default, so msgpack compatibility will not be affected.

BTW, if application/msgpack is the default MIME for msgpack, then consider using application/msgpackel as the little-endian version. (el is the initials of "little-endian" read in little-endian order, reference: https://unix.stackexchange.com/questions/254887/what-does-armel-stand-for)

dukelec commented 3 years ago

I wrote a little-endian msgpackel library in C (no malloc required), and I also modified a Python version and a Javascript version of the msgpack library to support little-endian: https://github.com/dukelec/msgpackel

mlsomers commented 3 years ago

My C# lib LsMsgPack as well as the fiddler plugin makes it configurable. In MsgPackSettings there's a enum EndianAction, the choices are:

SwapIfCurrentSystemIsLittleEndian // Default value: will only reorder when the current system is little-endian.
AlwaysSwap // Force reordering bytes (regardless of current system).
NeverSwap // Do not reorder bytes (regardless of current system).

Midar commented 3 years ago

I don't think this makes any sense at all, as you cannot just access the raw bytes anyway. Read up on alignment ;). So what happens is that you read the bytes individually anyway, at which point it no longer makes a difference in which order you access them. Can you point to any implementation that would actually benefit from this, together with some performance data? Otherwise this seems to only add needless complexity. And we already see the issues with a format that exists in both BE and LE with UTF-16/UTF-32.

dukelec commented 3 years ago

@Midar

Many architectures support unaligned memory access, such as Cortex-M3, M4 ...

Even for architectures that do not support unaligned memory access, such as the Cortex-M0(+), it is easier and more efficient to use little-endian. For example, for float64, we only need to use memcpy to copy 8 bytes. With big-endian, we need to shift and merge the data many times. (The msgpackel C library I mentioned above uses memcpy to copy 64-bit data. Compilers usually optimize the memcpy to achieve very high efficiency.)

Little-endian is also used for better consistency. For example, the data that will be written to ext and bin is continuous little-endian data, such as a waveform data: if all converted to big-endian will be very inefficient; but if we use memcpy to copy data directly, both little-endian and big-endian will exist, which will be very confusing.

Midar commented 3 years ago

Unaligned access is slow, on all architectures. It's also undefined behavior in C. Do you have any benchmarking data that points to any performance difference at all? I do not buy the argument about bit shifts for a single second, given that all modern compilers detect bit shift logic to swap endianess and replace it with bswap / rev, which is essentially free. If you have enough data for it to matter, you will be memory constrained anyway.

ludocode commented 3 years ago

For example, for float64, we only need to use memcpy to copy 8 bytes. With big-endian, we need to shift and merge the data many times.

No you don't. Good MessagePack implementations will use compiler intrinsics where possible. For example here's mine: [1] [2]. Even without the compiler intrinsics, compiler optimizations recognize the bit shifting code as an endianness swap and will output instructions to do it efficiently. Lots of processors support single instructions to swap endianness. x86/x86_64 has a bswap instruction, ARM has a rev instruction and MIPS has a wsbh instruction.

It's maybe unfortunate that big-endian was chosen for MessagePack but the performance differences are so minor that it's not worth changing. Whatever you actually do with the data will dwarf the extra time needed to parse it. Making endianness optional adds a ridiculous amount of complexity for negligible benefit.

If you still think the endianness is a problem, you should at the very least provide performance measurements to prove it.

dukelec commented 3 years ago

I agree that the overhead of endianness conversion is very small and negligible.

My concern is still about consistency, for example, ext type -1 is big-endian time format, so if I want to customize the data with ext type >= 0, I'd better use big-endian format too, for example, I have prepared data in the memory which is [float32, uint16, uint8, int8] * N groups. To convert these data to big-endian, I need to write a special function and prepare an extra block of memory to store the intermediate data, e.g:

uint8_t origin_data[8 * N];
// prepare origin data ...

// convert to big-endian
uint8_t tmp_data[8 * N];
for (int i = 0; i < N; i++) {
    put_unaligned_be32(*(uint32_t *)(origin_data + 8 * i + 0), tmp_data + 8 * i + 0);
    put_unaligned_be16(*(uint16_t *)(origin_data + 8 * i + 4), tmp_data + 8 * i + 4);
    memcpy(tmp_data + 8 * i + 6, origin_data + 8 * i + 6, 2);
}

// call msgpack library to write type 6 ext data
uint8_t *pos = mpk_w_ext(msg_buf, 6, tmp_data, 8 * N); // args: buf, type, data, len

If the little-endian is used uniformly, we only need:

uint8_t origin_data[8 * N];
// prepare origin data ...

uint8_t *pos = mpk_w_ext(msg_buf, 6, origin_data, 8 * N);

Using the little-endian, for ext, we only need a generic mpk_w_ext function to handle all the data including the time with type -1, instead of writing different functions for different types.
This is why my msgpackel-c implementation currently totals less than 500 lines of code, while @ludocode references nearly 2000 lines of code in just one header file.
Many MCUs have very limited resources (and malloc is not available), so it is only appropriate to use relatively compact libraries.

ludocode commented 3 years ago

Wow, there are a lot of misconceptions to unpack here.

First of all, you are confusing the length of code with its compiled size. The 2000 lines you are talking about make MPack extremely portable so it will run efficiently on anything. This does not translate to a larger compiled code size. In fact it can make the optimized code quite a lot smaller. The MPack code is full featured, portable, fast, and well documented. Of course your little ad hoc library is shorter; it's virtually featureless, undocumented, and I guarantee performs poorly compared to MPack.

MPack is designed for embedded. When compiled without malloc() it's extremely compact as well. I've done extensive benchmarking here of MPack and I include the compiled code size in the benchmarks because I understand exactly how important it is for embedded. It's the compiled size that is relevant, not the length of the code. These benchmarks are a bit out of date but they still hold true today. MPack is only slightly larger than CMP, and only because it was compiled with malloc() support in these benchmarks.

Second of all, you should not need to build your entire array into an auxiliary buffer to swap endianness. This is a flaw in your library, not in MessagePack. Most MessagePack implementations support writing str, bin and ext incrementally so you can swap and write on the fly. You can call, say, mpack_start_ext(), then call mpack_write_bytes() as many times as needed to fill out the ext type. CMP lets you do this via cmp_write_ext_marker() and the reference implementation lets you do this with msgpack_pack_ext_body(). Just because you failed to include this feature in your library doesn't mean other libraries are missing it as well. This makes your complaints about code length all the more ridiculous.

Third of all, and most importantly, your custom ext type does not need to match the endianness of the rest of MessagePack. You can write anything you want! You can write any bits in any format. You can write a literal struct or array straight into an ext type if you like. You don't have to change endianness, remove structure padding, or do anything else. Why do you feel it's important for your custom data to match the rest of MessagePack?

dukelec commented 3 years ago

I know that more code does not mean a bigger compiled program, but the more code and more features, the more time it takes to learn how to use it. For critical embedded devices, it is also important to know enough details about the third-party code used to ensure that the device can run correctly for a long time. My implementation does not require complex documentation because it is simple enough. (Driven by the KISS principle. Although it is simple, it has enough features for MCUs.)

I also know that application-specific types can be defined arbitrarily and do not have to use the same endianness as the rest of the MessagePack. But I need to keep an eye on the endianness of each data, which is very confusing and prone to errors. Moreover, the generated code will definitely be larger. For example, if I want to write a 32-bit timestamp, there are only two options: one is to choose a larger library with built-in processing of the timestamp format; the other is to define an intermediate variable and convert the endianness once externally. If I use the little-endian, then I only need:

uint32_t time_sec = 1608336000;
uint8_t *pos = mpk_w_ext(msg_buf, -1, (uint8_t *)&time_sec, 4); // type = -1

Midar commented 3 years ago

If it's "confusing" that MsgPack uses a different endianess than your data, you're breaking abstraction. You should not need to care or even know what endianess MsgPack uses. If you use an MsgPack library, that is supposed to hide the wire format entirely from you. If it does not, that library has a much bigger problem than endianess.

I see a lot of hypothesis and absolutely no data. Can you please point to some data that shows that there is any advantage at all to this? Because so far I only see the downside of increased complexity. Which is ironic, given you want to have decreased complexity while proposing something that vastly increases complexity.

dukelec commented 3 years ago

@Midar The time_sec example above has fully illustrated that using little-endian can make the library small and simple, and the text in front of the example also illustrates what it means to make the library small and simple.

@ludocode The msgpackel-c implementation also support writing map, array, str, bin and ext incrementally: user can call library functions or write the corresponding header manually, backup the address of the length field in the header, then write variable length content, and update the length in the header after writing. This way can reduce the memory copy. (When writing the header, choose the version that provides enough space, for example, bin16 is chosen, but the user can actually write less than 256 bytes of data.)

ludocode commented 3 years ago

For example, if I want to write a 32-bit timestamp, there are only two options: one is to choose a larger library with built-in processing of the timestamp format; the other is to define an intermediate variable and convert the endianness once externally. If I use the little-endian, then I only need:

You only need to do that stuff when using your library, because your implementation of MessagePack is incomplete. The fact that it's only 500 lines reveals just how little it actually does and how much it leaves to the caller. You're so proud of how short your implementation is yet you won't even add a helper function to write a timestamp properly. No wonder you're complaining that it's error prone.

Other libraries do all of this for you. For example MPack has mpack_write_timestamp() which will automatically choose the shortest timestamp representation, fix endianness, flush the buffer when full, and do whatever else is necessary so you don't have to think about it. That's the whole point of a library. All of your problems would be solved by using a proper library that does all of this for you.

It's time to close this issue. It's clear that you have no interest in using a proper implementation nor improving your own. You only want to change MessagePack to overcome the limitations of your code, and your change would impact thousands of happy users of MessagePack who have none of these problems. This should be closed.

dukelec commented 3 years ago

Well, we can't convince each other, I suggest to put it on hold for a while and see what more people's attitude is.

Again, adding msgpackel does not affect any msgpack users, and for msgpack users, the existence of msgpackel can be completely ignored, because msgpack will always be the default option.

MessagePack itself stands for extremely efficient, very fast and small, I just follow what it follows. If the discussion is about a json library, I don't care how it is implemented.

By the way: for the definition of timestamp 64 bits and 96 bits, nanoseconds at the beginning and seconds at the end, looks more like a little-endian consideration. For the 64-bit timestamp, the first 32 bits of data, nanoseconds is in the high bits, and the lower two bits are the high bits of seconds, which looks very strange. After using little-endian, the lower 30 bits of the first 32 bits are nanoseconds, and the upper two bits are the lower bits of seconds, which is more in line with the norm.

Midar commented 3 years ago

The time_sec example above has fully illustrated that using little-endian can make the library small and simple, and the text in front of the example also illustrates what it means to make the library small and simple.

You're completely ignoring any complexity this introduces on the protocol layer. You're asking to change the entire protocol to make one implementation that breaks basic abstraction and encapsulation rules shorter, at the cost of all other implementations needing to add extra complexity. Please look at the mess that is UTF-16, where today everybody agrees it was a mistake to have a version in big endian and in little endian. Why are you so keen on repeating that mistake?

dukelec commented 3 years ago

It is not appropriate to use UTF-16 as an analogy.

The biggest drawback of UTF-16 is that it is not ASCII compatible, and it is also not compatible with the C language '\0' for character termination.

The endianness problem of UTF-16 is indeed very troublesome, because it is text, presented where the text should appear, and there is no extra packaging, so the only way to distinguish endianness is to add BOM marker to the content, which adds a lot of trouble.

However, for binary msgpack, the protocol itself does not have a header identifier, so there must be an additional description, otherwise there is no way to know that the data is in msgpack format.

Unless msgpack is used within a software or project, if you want to use it as a common format, you have to tell people that it is in msgpack format by some other means, such as a specific file suffix, a MIME field in the HTTP header, etc.

If a msgpack supports both big-endian and little-endian, it can be distinguished by the file suffix or MIME type, without adding a BOM like UTF-16.

The reason why UTF-16 supports both big-endian and little-endian even at such a high cost is that it can facilitate different users and improve efficiency.

For msgpack, it costs almost nothing to have both big-endian and little-endian, so why not?

For msgpack libraries, adding the little-endian option is very simple. If you don't want to add this option to your own library, you can leave it out.

willeccles commented 3 years ago

IMO adding a little-endian version of msgpack is pointless and does nothing but add complexity. Even in embedded, swapping endianness is almost never a concern, despite having limited processing power and tight time constraints. There is next to no efficiency gained by effectively doubling the complexity of the spec. Among other things, big-endian is the agreed-upon network byte order and a binary format intended for network transmission should respect this rule. If you have somehow found a use-case where the byte swap is actually a bottleneck, I would love to hear about it, and I would highly recommend not using such a complex format as msgpack if you are so resource constrained.

For msgpack, it costs almost nothing to have both big-endian and little-endian, so why not?

I believe "almost nothing" is quite the understatement. A complete re-evaluation of the spec, an update for every library, additional code that otherwise has no reason to exist, and additional testing are far from "almost nothing."

dukelec commented 3 years ago

The abbreviation CAN at the beginning of CANopen that I mentioned above is: Controller Area Network. It is used very widely, for example, in the car you drive. In your words: little-endian is the agreed-upon network byte order for controllers.

Using little-endian can make the MCU code more compact and smaller. MCU programming often encounters a lack of flash space, especially when there is a separate bootloader partition, in this case even if we can reduce a few hundred bytes is very important.

For the spec, adding little-endian reduces the length of the spec. Just state at the beginning that msgpack defaults to big-endian, and the specific endianness descriptions for each type can be removed. Specifically in terms of bits: big-endian uses msb; little-endian uses lsb. (Currently only the 64-bit timestamp is relevant, as the single byte is split.)

If the library's code style is fine, adding little-endian options requires only a dozen lines of code. As for testing, again, only a small amount of code needs to be added. Moreover, it is entirely up to the author of the library to decide whether or not to add the little-endian option, and it is perfectly fine not to add it.

willeccles commented 3 years ago

I am quite familiar with CAN. If you use CAN a lot, you should be aware that transmitting something like msgpack data over CAN is quite an abuse of CAN, and you should not be doing that in the first place. Not to mention, CANopen has standardized on little-endian, but that is by no means the CAN standard. CAN does not require a specific endianness, with the exception of bit endianness at the bus level. Swapping endianness should not add more than a few bytes to your MCU code in the first place, as it is a single instruction (or only very few, if not) and does not have to be inlined if size is of the essence. Your compiler is smart enough to know this when told to optimize for size.

For the spec, adding little-endian reduces the length of the spec

This just isn't possible, I'm not sure how this could happen. Adding to the spec inherently makes the spec longer.

Specifically in terms of bits: big-endian uses msb; little-endian uses lsb.

I'm not sure what you mean here. Bit endianness was never the question, and you would be hard pressed to get anyone to swap the bit endianness of the spec.

Moreover, it is entirely up to the author of the library to decide whether or not to add the little-endian option

This does not remove the added complexity of adding this option; it merely offloads the complexity to potential users of the library rather than the library's author.

Adding multiple endianness options increases complexity of the spec and implementations and opens them up to future special cases which would not be necessary without this change. In any case, as mentioned previously, msgpack lets you use your own data types at your leisure, so if you really need little-endian data, simply pack it in to a custom ext type.

It is extremely difficult to conceive of an application where swapping endianness is a major bottleneck/constraint on any MCU, and it's far more likely that you're looking in the wrong place to optimize your code or you need to reevaluate the complexity of your solution or the MCU you chose. The reason ext types exist is that uncommon/unusual/specific scenarios are accounted for. Any situation where you consider endianness swapping to be a bottleneck most definitely falls into that category.

ludocode commented 3 years ago

If the library's code style is fine, adding little-endian options requires only a dozen lines of code. As for testing, again, only a small amount of code needs to be added.

I don't know how you could possibly come to the conclusion that testing this would be easy. Your library doesn't have any tests so you have no idea what it costs to test this.

A good quality library like MPack has roughly as much test code as implementation code. Supporting this would nearly double the amount of unit test code because the majority of it is comparing hexdumps. I would guess a minimum of 3,000 lines of new code would be required to verify that everything works properly with the endianness change. We would need to double our MessagePack test files, we need buildsystem changes to support it, lots of new fuzzing time required, etc.

These are serious projects, used by real companies in real products. This stuff needs to be battle-tested. I would never ship a drastic change like this without huge amounts of testing.

For msgpack, it costs almost nothing to have both big-endian and little-endian, so why not?

The costs would be extreme.

On top of the testing costs, another major cost is how to support both big- and little-endian at runtime. A compile-time switch could be done in some compiled languages but then the app could only parse one endianness. MessagePack is supposed to be universal so I would want the option to support both at runtime, and dynamic languages would have no choice because they have no such thing as compile-time. A runtime endianness check on every read/write would hurt performance far more than an endianness swap ever could.

(Making implementations templated on endianness could preserve performance in some cases but would drastically increase complexity. Most languages do not have good templating features and even those that do are extremely difficult to use properly.)

This is without considering the far larger intangible costs, like programmers having to think about endianness every time they want to use MessagePack. The whole point of MessagePack is its simplicity and its abstraction over these lower level concepts. Your change would ruin this.

Using little-endian can make the MCU code more compact and smaller.

An endianness swap is a single processor instruction on all relevant architectures. I would be surprised if switching to little endian saved even 50 bytes of machine code. If you really care about a few dozen bytes of ROM then you should obviously not use a schemaless format. A minimal MessagePack implementation is going to be several kilobytes at least. You would save far more space if you just memcpy your structs straight onto the wire.

More importantly, you could easily just measure this! Why don't you just tell us the exact number of bytes you save in your microcontroller by switching to little-endian?

You keep changing your arguments for why you need little-endian. First you complained about the overhead of conversion. Then when you agreed conversion overhead was negligible, you complained about consistency with modern processor architectures. Then when you were told that consistency with network protocols is more important, you complained about code length. Then when you agreed code length is irrelevant, you complained about needing to manually convert your ext types. Then when you were shown that these were limitations of your own library, you complained about compiled code size. What argument will you use next? Will you ever show us data?

dukelec commented 3 years ago

I'm familiar with CAN, although I've never used it, and I prefer CDBUS, which is based on RS485 and supports multi-master peer-to-peer transmission, faster and compatible with traditional RS485 devices. I really don't plan to use msgpack for these buses that require real time communication, I do use msgpack for communication with remote servers.

My library currently has no test code, on the one hand, because my implementation is very simple, so I don't need to test much to ensure stability; on the other hand, even if the testing on the PC is comprehensive, there is no guarantee that it will work on the MCU without problems, because the PC and MCU are different platforms. As an analogy, the Linux kernel has almost no test code.

The library I have implemented requires very little change if I add a big-endian version, and I will add the big-endian version if I need it in the future.

Using little-endian would eliminate a lot of functions and be more elegant. I will continue to use it, and anyone who thinks the same way should feel free to use the little-endian fork, whether little-endian is merged upstream or not.

Midar commented 3 years ago

For the spec, adding little-endian reduces the length of the spec.

Ah, you just want to remove the endianess from the specification. And then what? How do you know which format you have? I gave UTF-16 as a bad example, but it seems you want to make it even worse and not even have something like a BOM to detect which byte order it is? (Given that you only want to remove from the spec and not add…)

If the library's code style is fine, adding little-endian options requires only a dozen lines of code.

I disagree with that heavily. If the library is written well, it will read byte by byte and shift things together. This means it is independent of the CPU's endianess, and hence means it does not have to use some swapping function everywhere so that you could just add an if to the swapping method to see that you are in little endian mode.

Moreover, it is entirely up to the author of the library to decide whether or not to add the little-endian option, and it is perfectly fine not to add it.

A spec where it's clear from the start that it won't be implement. What a great idea! \

willeccles commented 3 years ago

I'm familiar with CAN, although I've never used it, and I prefer CDBUS, which is based on RS485 and supports multi-master peer-to-peer transmission, faster and compatible with traditional RS485 devices.

Ignoring the subtle self-promotion, this is a pretty neat project. I'll look into it at some point :)

avkonst commented 2 years ago

I have found one very useful feature of big-endian msgpack representation - I can do lexicographical value comparison without unpacking the data. For example, if I pack a timestamp in front of the message, and want to pick a later message ordering by timestamp, it can be easily done withouut unpacking a message.