iqbal-lab-org / cobs

COBS - Compact Bit-Sliced Signature Index (for Genomic k-Mer Data or q-Grams)
https://panthema.net/cobs
MIT License
16 stars 2 forks source link

Fix undefined behaviour of char bit shifting when combining classic i… #6

Closed Zhicheng-Liu closed 3 years ago

Zhicheng-Liu commented 3 years ago

…ndices

Previously when combining multiple classic indices into a single classic index, the contents of source indices are read in as char. During the interleaving process, depending on the current position of the destination index, both left and right shifts on the next char could be performed.

However, there are a few undefined behaviours that could affect the results depending on the platform:

  1. The signedness of a char is an undefined behaviour. Hence when bit shifting, the usual arithmetic conversion performed on the char is undefined. The char could be promoted to either signed int or unsigned int.
  2. If the char is treated as signed int, the bit shifting (both left and right) is also undefined in pre-c++20 standards. The behaviour is platform dependent.

This change fixes the issue by declare the contents read from source indices as unsigned char.