platform dependent int in reference implementation might overflow

phoboslab / qoi

The “Quite OK Image Format” for fast, lossless image compression

MIT License

6.96k stars 330 forks source link

platform dependent int in reference implementation might overflow #197

Closed kpckpc closed 2 years ago

kpckpc commented 2 years ago

The reference implementation might overflow the destination buffer by 4 bytes on ILP64 (int=64bit)

typedef union {
    struct { unsigned char r, g, b, a; } rgba;
    unsigned int v;
} qoi_rgba_t;

The overflow happns here (line 602):

        if (channels == 4) {
            *(qoi_rgba_t*)(pixels + px_pos) = px;
        }

better to include stdint.h and use uint32_t. Otherwise have a load of ifdefs for all situations.

I doubt qoi files will be used on 16 bit microcontrollers, but the 16-bit ints could cause problems there, because only the rg or ba colors are compared (depending on endianness)

phoboslab commented 2 years ago

I'm in a pickle here. I agree that just including stdint.h would be nice, if it wasn't for the stupid dance you'd have to do on Windows/MSVC. Also, the current implementation compiles even with C89, which (afaik) doesn't come with stdint.h either.

So even with stdint.h you have to have that load of ifdefs and verify that you end up with the right int sizes, like stb_image does (which still might not compile with C89?).

If that's the only place you can see where a 64bit int causes a problem, then maybe the more portable way would be to just do

px.rgba.r = pixels[px_pos + 0];
px.rgba.g = pixels[px_pos + 1];
px.rgba.b = pixels[px_pos + 2];
px.rgba.a = pixels[px_pos + 3];

and hope that the compiler is smart enough to not make it slower than the typecast?!

Alcaro commented 2 years ago

I doubt qoi files will be used on 16 bit microcontrollers

Considering #190 exists, I'm slightly more optimistic.

But fair chance they'll write their own coders. int size is far from the only issue on microcontrollers.

I agree that just including stdint.h would be nice, if it wasn't for the stupid dance you'd have to do on Windows/MSVC.

Doesn't latest msvc fix that? But yes, they took far too long.

and hope that the compiler is smart enough to not make it slower than the typecast?!

Clang is smart. GCC isn't.

But you can make it slightly less unsmart with a quick memcpy. https://godbolt.org/z/71ET4Gddq

Another issue with the union is that it'll give weird results on big endian platforms. While they, like not-32-int, are rare in this year, several retrocomputers are big endian, like Wii, Xbox 360 and PowerPC Mac. Replacing the union with four byte writes (or memcpy or something) would fix that too.

BenBE commented 2 years ago

The least that should be done here is a sanity size check on that type to be exactly the size you expect and fail compiling otherwise.

phoboslab commented 2 years ago

But you can make it slightly less unsmart with a quick memcpy. https://godbolt.org/z/71ET4Gddq

Cool, thanks for investigating! I'm hesitant to use memcpy, since some users want to compile without stdlib (https://github.com/phoboslab/qoi/pull/67).

I just ran a benchmark myself (compiled with gcc 8.3) and curiously couldn't measure any difference between the typecast version and the proposed four byte writes/reads. Even when only the alpha channel is in the conditional (and probably hard to optimize):

px.rgba.r = pixels[px_pos + 0];
px.rgba.g = pixels[px_pos + 1];
px.rgba.b = pixels[px_pos + 2];

if (channels == 4) {
    px.rgba.a = pixels[px_pos + 3];
}

So I guess this would be the best solution after all.

Alcaro commented 2 years ago

Personally I'd remove the union and change it to typedef struct { unsigned char r, g, b, a; } qoi_rgba_t;, to simplify things a little, and to ensure there are no stray .v uses hiding anywhere.

I'm hesitant to use memcpy, since some users want to compile without stdlib (https://github.com/phoboslab/qoi/pull/67).

Those users are in for a surprise if they try compiling anything with Clang. https://godbolt.org/z/zvGznjYPo

phoboslab commented 2 years ago

Those users are in for a surprise if they try compiling anything with Clang. https://godbolt.org/z/zvGznjYPo

This is the reason there's a QOI_ZEROARR(index); macro, instead of qoi_rgba_t index[64] = {0};. I guess this was the only place where the compiler automatically inserted a memset. memcpy doesn't seem to be inserted - at least for current compilers.

Personally I'd remove the union

It's still used in comparisons with the previous pixel and the index px.v == px_prev.v.