samtools / htslib

C library for high-throughput sequencing data formats
Other
784 stars 447 forks source link

Apply the packed attribute to uint*_u types for Clang #1667

Closed MaskRay closed 10 months ago

MaskRay commented 10 months ago

... so that the following code

static inline void u32_to_le(uint32_t val, uint8_t *buf) {
    *((uint32_u *) buf) = val;
...

will not cause -fsanitize=alignment failures when building with Clang.

daviesrob commented 10 months ago

clang seems happy enough with this, so I guess it's OK.

This hack was actually put in to prevent certain versions of gcc from using SIMD instructions that expected aligned data. If you want to use sanitizers, I'd recommend building with -DHTS_ALLOW_UNALIGNED=0 in CPPFLAGS which enables the properly standards-conforming version of the code. This will keep -fsanitize=undefined happy.

One day the dodgy typecasts will be replaced by calls to memcpy(), which is the proper way to do unaligned loads. At least these days most compilers manage to replace this with the intended register load (given sufficient optimisation).

MaskRay commented 10 months ago

Thank you!

If you want to use sanitizers, I'd recommend building with -DHTS_ALLOW_UNALIGNED=0 in CPPFLAGS which enables the properly standards-conforming version of the code. This will keep -fsanitize=undefined happy.

I think it'd be nice if users don't additionally specify -DHTS_ALLOW_UNALIGNED=0.

For further cleanup, functions like the following can just be replaced with memcpy. See https://github.com/Blosc/c-blosc2/pull/550#issuecomment-1688867655 that ((p)[0] | (p)[1]<<8 | (p)[2]<<16 | (p)[3]<<24) may not be optimized well while memcpy is always good:

static inline uint16_t le_to_u16(const uint8_t *buf) {
#if defined(HTS_LITTLE_ENDIAN) && HTS_ALLOW_UNALIGNED != 0
    return *((uint16_u *) buf);
#else
    return (uint16_t) buf[0] | ((uint16_t) buf[1] << 8);
#endif
}