Closed alt-graph closed 9 months ago
Some solution could be as follows. That compiles ok, but fails only if the enum or functions are actually used.
--- a/include/gul14/bit_manip.h
+++ b/include/gul14/bit_manip.h
@@ -78,20 +78,17 @@ using BitFunctionReturnType =
*/
enum class endian
{
-#if defined(__BYTE_ORDER__)
- little = __ORDER_LITTLE_ENDIAN__,
- big = __ORDER_BIG_ENDIAN__,
- native = __BYTE_ORDER__
+#if defined(__BTE_ORDER__)
+ little = __ORDER_LITTLE_ENDIAN__, ///< Little-endian (e.g. Intel)
+ big = __ORDER_BIG_ENDIAN__, ///< Big-endian (e.g. Motorola)
+ native = __BYTE_ORDER__ ///< Native endianness
#elif defined(_MSC_VER) && !defined(__clang__)
little = 0,
big = 1,
native = little
#else
- #error "Don't know how to determine machine endianness on this compiler"
- // Just for Doxygen:
- little, ///< Little-endian (e.g. Intel)
- big, ///< Big-endian (e.g. Motorola)
- native ///< Native endianness
+ #define GUL14_BIT_MANIP_NO_ENDIANNESS
+ // Don't know how to determine machine endianness on this compiler
#endif
};
@@ -225,6 +222,8 @@ bool constexpr inline bit_test(T bits, unsigned bit) noexcept {
return bits & bit_set<T>(bit);
}
+#ifndef GUL14_BIT_MANIP_NO_ENDIANNESS
+
/**
* Determine whether this platform uses big-endian (Motorola) order for storing multi-byte
* quantities in memory.
@@ -255,10 +254,18 @@ constexpr bool is_little_endian()
return endian::native == endian::little;
}
+#else
+// We can not determine the endianness, so fail compiling with undefined function
+constexpr bool is_big_endian();
+constexpr bool is_little_endian();
+#endif
+
/// @}
} // namespace gul14
+#undef GUL14_BIT_MANIP_NO_ENDIANNESS
+
#endif
// vi:ts=4:sw=4:et
--
2.25.1
Edit:
Ah, I forgot my test provoke code in the diff above... +#if defined(__BTE_ORDER__)
will of course fail always ;)
Boost has 4 endian-ness-es:
https://www.boost.org/doc/libs/1_67_0/doc/html/predef/reference.html#predef.reference.other_macros
And (older) GCC possibly have __BIG_ENDIAN__
or __LITTLE_ENDIAN__
So this seems to boil down to the question "Do we need this?". The answer is plain and simple: yes. There is a reason why standard libraries and boost include tests for endianness.
Two C++ programs that want to share information across a network or across a file need to know how multi-byte scalars are represented as bytes, there is just no way around that. So which options do we have?
htonl()
. It converts a uint32 to another uint32. Then we need to use a cast to char*
to get at the individual bytes again. Plus, we need to include a platform-dependent header. Plus, our 1-million integer array requires 1M calls to htonl()
. Even if we do not need to change the byte order at all.memcpy()
our 1M-integer array if it matches the target endianness, or proceed to 1., 2., or something similar to change it.3 is by far the most performant option (for a 1M array IMO really the only acceptable option) and you will therefore find it in every serialization library worth using. The main drawback is that platform-dependent includes and macros invade user code, hence the C++ comittee decided to add the std::endian enum in C++20, which finally provides a standard way to detect platform endianness. gul14::endian
is just a backport of this, nothing more, nothing less.
@Finii: You certainly have a right to dislike gul14::endian
and by extension std::endian
, but it exists for a reason. Serialization code is low-level code, but as much as we dislike that, we need it.
We can write entirely standard-conformant code using bit shifts and bit masks. Try doing that on an array of a million integers, though, and we keep the CPU utterly busy even if we do not need to change any byte order at all.
I do not quite get that argument. IF we can not handle that code for performance reasons on platforms where no endianness changes is needed, and then we detect by the functions in this PR that we do need an endianness change, we would anyhow need all that code? Or is the purpose here just to say at compile time: Can not handle your platform's endianness because of performance reasons?
Ah I see the last potion, you want fast code on 'native' endianness and so-slow-it-almost-does-not-work on differing endianness platforms. Ok (see approve above).
How is the transfer of floating point numbers? Is the bitwise representation the same on all platforms?
This is a very basic and small thing (as you said). All the problems of endianness are still on the lib user. Maybe it would be more worthy of GUL to provide real solutions and not just a platform trait.
For example modern font files have all data encoded in big endian layout.
A font renderer is supposed to NOT convert all the data structures on opening the font, but all the file structure and so on are developed to use the data as is "in situ" without a renderer data structure. That imposes "the same problem" for font renderers that need to access integer values from a big endian data structure regardless of platform endianness, always on every access.
Take for example [1] that provides the thin wrapper BigEndianValue<T>
that does the conversion on the fly on access (if needed), for example .from_in_place_buf()
.
The solution provided in this PR is just the bare needed basic block but not a real helper for the problems developers face with the endianness. Maybe we should strive to provide real use-ready solutions for the actual problem?
Rust is a bit ahead I guess, see for example also [2].
[1] https://github.com/codyd51/axle/blob/paging-demo/rust_programs/ttf_renderer/src/parse_utils.rs [2] https://docs.rs/byteorder/latest/byteorder/index.html
Edit: Add reference 2
How is the transfer of floating point numbers? Is the bitwise representation the same on all platforms?
The interwebs say that even IEEE-754 floating point numbers are stored in endian-dependent byte order. If we have an architecture that does not use IEEE-754, all bets are off...
https://stackoverflow.com/questions/2782725/converting-float-values-from-big-endian-to-little-endian/2782742 https://stackoverflow.com/questions/35763790/endianness-for-floating-point
This is a very basic and small thing (as you said). All the problems of endianness are still on the lib user. Maybe it would be more worthy of GUL to provide real solutions and not just a platform trait.
I do not disagree, but it might be difficult to find an API that works well for many use cases. Do we want to swap bytes in-place, copy-and-swap to a buffer, append to a vector, write to a stream? The DOOCS clientlib now has a solution, but only for a subset of these use cases (copy-and-swap to buffer/append).
This is a very basic and small thing (as you said). All the problems of endianness are still on the lib user. Maybe it would be more worthy of GUL to provide real solutions and not just a platform trait.
I do not disagree, but it might be difficult to find an API that works well for many use cases. Do we want to swap bytes in-place, copy-and-swap to a buffer, append to a vector, write to a stream? The DOOCS clientlib now has a solution, but only for a subset of these use cases (copy-and-swap to buffer/append).
I do not disagree, but it might be difficult to find an API that works well for many use cases. Do we want to ...
Right. The aforementioned solution does not swap anything in memory, but only on fetching into a singular value variable/register. This can even be optimized on platforms that have both-endianess where the memory-access cpu instructions can specify what to fetch.
But to be honest I have no clue at all how doocs uses this. It all sounds like 'there is endianness code but it will not work anyhow' :-> (or be too slow or ...)
:+1:
It all sounds like 'there is endianness code but it will not work anyhow' :->
Well, yes. We do not have a big-endian machine on which we can compile anymore, so the code is untested. Therefore, it won't work. As simple as that. ;)
This PR adds an enum class
gul14::endian
that works like C++20's std::endian and two convenience functionsis_little_endian()
andis_big_endian()
. Together, this offers a simple and portable solution to enquire the system's endianness at compile time.The PR also separates the documentation for the bit manipulation functions from the numeric functions.