uint8_t is optional in stdint.h

chrisnc commented 5 years ago

The standard doesn't require it to exist, and some microcontrollers that do support C99 and the other fixed-width types do not have uint8_t. I wanted to get feedback on eliminating the use of uint8_t in favor of unsigned char.

I believe that with some minor changes libhydrogen can be made to work on platforms that don't have an 8-bit integer type, but this would be a prerequisite to that.

jedisct1 commented 5 years ago

Other fixed-types are optional in C99 as well.

I'd rather define these types on platforms where they are missing rather than using less explicit and less predictable types.

Some DSPs don't have a 8 bit type, but it may not be worth making the code more complicated for very unlikely targets, especially since it would probably never be tested on these.

chrisnc commented 5 years ago

The problem is that on the platforms I'm describing, providing a definition of uint8_t is just misleading, rather than being explicit. It's not that they don't support stdint.h, they support exactly what the standard dictates. As for the other sizes, uint16_t doesn't appear to be used in any of the core code, only in platform-specific implementations in random.h. The other types (uint32_t and uint64_t) are more fundamental to the algorithms libhydrogen implements, so it makes sense to require that they exist. Re: making the code more complicated, I don't think this is necessarily the case. I'm not proposing that there be special cases for this either, it just requires some adjustments to make the code agnostic to the byte size. The typical approach is to treat unsigned char as an 8-bit byte where any bits beyond the least-significant 8 are ignored, implicitly or explicitly, as needed. This has the property that on all platforms where it is exactly 8 the behavior is identical.

jedisct1 commented 5 years ago

There will be a little bit more to do than masking. With 16 bit char, array sizes and pointer arithmetic need to be adjusted as well.

But even if we get to the point where everything works, this will introduce a new constraint for every single change made to the code from now on. Which is likely to be constantly forgotten about afterwards.

Also, masking may be optimized out by compilers where CHAR_BIT==8. Or it may not, and then we introduced a generally unnecessary penalty.

What target architecture do you have in mind?

chrisnc commented 5 years ago

You can leave the array lengths the same; just use each unsigned char as though it were just 8 bits, and all the lengths are the same as before. You will run into problems if you try to manipulate other types through an unsigned char *, or interpret their sizes as though it were the number of octets, but there are ways to avoid doing this.

The architecture I have in mind has the following properties:

CHAR_BIT = 16
sizeof(short) = 1
sizeof(int) = 1
sizeof(long) = 2
sizeof(long long) = 4

This means that uint16_t, uint32_t, and uint64_t are defined as unsigned int, unsigned long, and unsigned long long, respectively, but there is no uint8_t. unsigned char has the same size as unsigned int.

Let me know if you'd be interested in seeing an example of the kinds of changes one could make to accomplish this. I just wanted to get your thoughts on it before submitting something publicly.

jedisct1 commented 5 years ago

After having seen the implications on SipHash, I'd rather not make such a change, especially since it will be very difficult to test without having such a platform.

chrisnc commented 5 years ago

I tried doing some research on how SipHash handles this and what the implications are, but I'm not finding much. Could you point me in the right direction for what you're referring to? The reference implementations I found seem to just assume uint8_t.

jedisct1 commented 5 years ago

https://github.com/veorq/SipHash/pulls

chrisnc commented 5 years ago

Thanks. I can see why that kind of approach would not be acceptable. It's possible to support 16-bit bytes without ifdef or mentioning CHAR_BIT (except possibly to allow optimized versions).

chrisnc commented 5 years ago

Reviewing those PRs and the SipHash code itself I think I understand better what's going on. As far as I can see, the reference SipHash code should already work for CHAR_BIT > 8, just by replacing uint8_t with unsigned char and passing octet streams in and expecting octet streams out. The PRs there want something different than this, though.

The situation with the gimli and x25519 code in libhydrogen is different; it uses type punning with pointer casts and memcpy from octet sequences to native word types that won't work when there is no octet type. There are ways to avoid these constructs to make the code work without fundamentally changing the interfaces (and you can be confident testing even without one of these devices).

chrisnc commented 4 years ago

I had some discussions with the author of the PRs you referenced and I think the final result is much less invasive and matches what I had in mind when I created this issue. See here: https://github.com/veorq/SipHash/pull/22/files

Most of the relevant discussion is on this PR, now closed: https://github.com/veorq/SipHash/pull/18

jedisct1 / libhydrogen

uint8_t is optional in stdint.h #59