Closed chrisnc closed 5 years ago
Other fixed-types are optional in C99 as well.
I'd rather define these types on platforms where they are missing rather than using less explicit and less predictable types.
Some DSPs don't have a 8 bit type, but it may not be worth making the code more complicated for very unlikely targets, especially since it would probably never be tested on these.
The problem is that on the platforms I'm describing, providing a definition of uint8_t
is just misleading, rather than being explicit. It's not that they don't support stdint.h, they support exactly what the standard dictates. As for the other sizes, uint16_t
doesn't appear to be used in any of the core code, only in platform-specific implementations in random.h
.
The other types (uint32_t
and uint64_t
) are more fundamental to the algorithms libhydrogen implements, so it makes sense to require that they exist.
Re: making the code more complicated, I don't think this is necessarily the case. I'm not proposing that there be special cases for this either, it just requires some adjustments to make the code agnostic to the byte size. The typical approach is to treat unsigned char
as an 8-bit byte where any bits beyond the least-significant 8 are ignored, implicitly or explicitly, as needed. This has the property that on all platforms where it is exactly 8 the behavior is identical.
There will be a little bit more to do than masking. With 16 bit char
, array sizes and pointer arithmetic need to be adjusted as well.
But even if we get to the point where everything works, this will introduce a new constraint for every single change made to the code from now on. Which is likely to be constantly forgotten about afterwards.
Also, masking may be optimized out by compilers where CHAR_BIT==8
. Or it may not, and then we introduced a generally unnecessary penalty.
What target architecture do you have in mind?
You can leave the array lengths the same; just use each unsigned char
as though it were just 8 bits, and all the lengths are the same as before. You will run into problems if you try to manipulate other types through an unsigned char *
, or interpret their sizes as though it were the number of octets, but there are ways to avoid doing this.
The architecture I have in mind has the following properties:
CHAR_BIT = 16
sizeof(short) = 1
sizeof(int) = 1
sizeof(long) = 2
sizeof(long long) = 4
This means that uint16_t, uint32_t, and uint64_t are defined as unsigned int
, unsigned long
, and unsigned long long
, respectively, but there is no uint8_t
. unsigned char
has the same size as unsigned int
.
Let me know if you'd be interested in seeing an example of the kinds of changes one could make to accomplish this. I just wanted to get your thoughts on it before submitting something publicly.
After having seen the implications on SipHash, I'd rather not make such a change, especially since it will be very difficult to test without having such a platform.
I tried doing some research on how SipHash handles this and what the implications are, but I'm not finding much. Could you point me in the right direction for what you're referring to? The reference implementations I found seem to just assume uint8_t
.
Thanks. I can see why that kind of approach would not be acceptable. It's possible to support 16-bit bytes without ifdef or mentioning CHAR_BIT (except possibly to allow optimized versions).
Reviewing those PRs and the SipHash code itself I think I understand better what's going on. As far as I can see, the reference SipHash code should already work for CHAR_BIT > 8
, just by replacing uint8_t
with unsigned char
and passing octet streams in and expecting octet streams out. The PRs there want something different than this, though.
The situation with the gimli
and x25519
code in libhydrogen is different; it uses type punning with pointer casts and memcpy
from octet sequences to native word types that won't work when there is no octet type. There are ways to avoid these constructs to make the code work without fundamentally changing the interfaces (and you can be confident testing even without one of these devices).
I had some discussions with the author of the PRs you referenced and I think the final result is much less invasive and matches what I had in mind when I created this issue. See here: https://github.com/veorq/SipHash/pull/22/files
Most of the relevant discussion is on this PR, now closed: https://github.com/veorq/SipHash/pull/18
The standard doesn't require it to exist, and some microcontrollers that do support C99 and the other fixed-width types do not have
uint8_t
. I wanted to get feedback on eliminating the use ofuint8_t
in favor ofunsigned char
.I believe that with some minor changes libhydrogen can be made to work on platforms that don't have an 8-bit integer type, but this would be a prerequisite to that.