Bit-sized integer decoding invalid results

kaitai-io / kaitai_struct_javascript_runtime

Kaitai Struct: runtime for JavaScript

Apache License 2.0

37 stars 22 forks source link

Hey, recently, I encountered a bug in reading bit-sized integers. In the below example, there are 3 fields:

prefix: 4-bit integer,
value: 32-bit integer,
suffix: 4-bit integer.

Here is the binary representation of a frame (5 bytes):

prefix                 value                suffix
[1000][0111 01110101 00001001 10010001 0001][0000]

Expected values for fields are as follows:

prefix: 8,
value: 2001770769,
suffix: 0.

When I tried to decode the frame using Kaitai (javascript library kaitai-struct@0.10.0-SNAPSHOT.1), results were not correct (value field is decoded as 122722577).

Here is the Kaitai template I’m using for an example:

meta:
  id: example
  endian: be
seq:
  - id: prefix
    type: prefix
  - id: value
    type: value
  - id: suffix
    type: suffix
types:
  prefix:
    seq:
      - id: value
        type: b4
  value:
    seq:
      - id: value
        type: b32
  suffix:
    seq:
      - id: value
        type: b4

I also tried to use Kaitai Web IDE, and the results are the same (invalid).

After debugging the code of the library, I think the problem is here (KaitaiStream.prototype.readBitsIntBe method):

When reading value field (the 2nd field in the frame), there are still 4 bits left in this.bits from the previous byte, and it reads 4 next bytes from the buffer to get required 32 bits in total.
Within the for loop, it writes each byte into this.bits.
First 3 iterations works ok, where it increases this.bits size up to 28 bits (starting from 4 bits, it adds 8 bits of every next byte).
Then, when 4th byte is read, its increases this.bits size up to 36 bits.

It seems, like when this.bits is increased over 32 bits, it looses most significant bits (in that case 4 first bits from the left).

There is the corresponding code with comments:

KaitaiStream.prototype.readBitsIntBe = function(n) {
  // JS only supports bit operations on 32 bits
  if (n > 32) {
    throw new Error(`readBitsIntBe: the maximum supported bit length is 32 (tried to read ${n} bits)`);
  }
  var bitsNeeded = n - this.bitsLeft;
  if (bitsNeeded > 0) {
    // 1 bit  => 1 byte
    // 8 bits => 1 byte
    // 9 bits => 2 bytes
    var bytesNeeded = Math.ceil(bitsNeeded / 8);
    var buf = this.readBytes(bytesNeeded);          // [1 point]
    for (var i = 0; i < bytesNeeded; i++) {
      this.bits <<= 8;                              // [2-4 points]
      this.bits |= buf[i];                          // [2-4 points]
      this.bitsLeft += 8;
    }
  }

    ... // the rest is not relevant
};

At 4th iteration of the loop, it should put 4 first bits of the last byte, so instead of this.bits <<= 8, it should do this.bits <<= 4, as left capacity is 4 bits, so we won’t exceed 32 bits.

Here is this.bits value, that is held during the for loop iteration, when we decode value field:

00000000 00000000 00000000 00000111 [initial - before entering the loop - left part from previous field]
00000000 00000000 00000111 01110101 [after 1st iteration]
00000000 00000111 01110101 00001001 [after 2nd iteration]
00000111 01110101 00001001 10010001 [after 3rd iteration]
01110101 00001001 10010001 00010000 [after 4th iteration] (but should be 01110111 01010000 10011001 00010001).

Im not a javascript expert, but it seems like this.bits is able to hold only 32 bits max (not 64 bits).

Hi @kbudka, thanks for the detailed report.

4. Then, when 4th byte is read, its increases this.bits size up to 36 bits.

It seems, like when this.bits is increased over 32 bits, it looses most significant bits (in that case 4 first bits from the left).

Yes, this is the correct conclusion. This is the limitation of bitwise operators in JavaScript (https://web.archive.org/web/20200705081541/https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/Bitwise_Operators#Signed_32-bit_integers):

The operands of all bitwise operators are converted to signed 32-bit integers in two's complement format, except for zero-fill right shift which results in an unsigned 32-bit integer.

At 4th iteration of the loop, it should put 4 first bits of the last byte, so instead of this.bits <<= 8, it should do this.bits <<= 4, as left capacity is 4 bits, so we won’t exceed 32 bits.

Thanks for the debugging and suggesting the solution. Such a brilliant and lightweight fix! I actually came across this issue in the past, but I was a bit lazy to think about it thoroughly and this simple solution didn't occur to me. I was virtually presuming that this would be tricky to do with just one working integer variable this.bits only capable of storing and manipulating 32 bits (and we would have to resort to having a second 32-bit integer for the remainder, eventually a growing array of them, which is exactly the principle of how BigInt are implemented - see https://github.com/kaitai-io/kaitai_struct/issues/183), but the trick is that you're safe to discard (shift left out of the 32-bit window) bits that were already parsed and are no longer part of the value that is being parsed right now (as you correctly stated).

So now we just need to think about the algorithm a bit more to figure out what needs to be changed in the readBitsIntBe implementation. I presume that the only lines that need to be changes are these:

    for (var i = 0; i < bytesNeeded; i++) {
      this.bits <<= 8;
      this.bits |= buf[i];
      this.bitsLeft += 8;
    }

In theory, if we change 8 to something else, the rest of the method should accept that without any problems.

Also, we should then apply an analogic fix to the readBitsIntLe method, which is the "little-endian" version of readBitsIntBe.

BTW, the issue that you described actually applies to nearly all languages, because they all use the exact same readBitsInt{Be,Le} algorithm and have limited this.bits integer size (maybe except Python and Ruby, which support arbitrary-precision arithmetic with integers) - although usually 64 bits, not 32. So anything like this will almost certainly yield incorrect readings in C++, C#, Go, Java, Lua (depends on the Lua build), Nim, Perl (depends on the Perl build), PHP and Ruby:

meta:
  id: bits_b64_unaligned
seq:
  - id: one
    type: b4
  - id: two
    type: b64
  - id: three
    type: b4

So, we should first create a test format in our test suite for both 32-bit and 64-bit integers and also for both readBitsInt{Be,Le} methods. Then figure out how to transform the idea into the algorithm, modify a runtime library of one language, test it and then do the same modification for the rest of runtimes (for other languages).

kaitai-io / kaitai_struct_javascript_runtime

Bit-sized integer decoding invalid results #20