keichi / binary-parser

A blazing-fast declarative parser builder for binary data
MIT License
866 stars 136 forks source link

Confused about big-endian vs little-endian for bitfields #46

Open rmd6502 opened 7 years ago

rmd6502 commented 7 years ago

I'm parsing a bitfield-encoded date, bits 15-12=month, 11-7=day, 6-0=(year-2000). The twist is the bytes are in little-endian. Looking at the code and tests for bitfields, it looks like the algorithm is

read bytes in big-endian order
retrieve bitfields starting from the LSB and working toward the MSB

The LE encoding for 01/07/2017 (US order) is 0x1391, but reading BE (0x9113) puts the LSB of the day at the topmost bit 15, and the MSBs of the day at bits 3-0.

How can I properly parse this date using BIT directives?

bodgybrothers commented 7 years ago

The code doesn't work as expected for LE. Make these changes to binary_parser.js starting at line 361. (This breaks BE bitfields stuff).

        if (sum <= 8) {
            ctx.pushCode('var {0} = buffer.readUInt8(offset);', val);
            sum = 8;
        } else if (sum <= 16) {
            ctx.pushCode('var {0} = buffer.readUInt16LE(offset);', val);
            sum = 16;
        } else if (sum <= 24) {
            ctx.pushCode('var {0} = buffer.readUInt8(offset ) | buffer.readUInt16LE(offset + 1) << 8;', val);
            sum = 24;
        } else if (sum <= 32) {
            ctx.pushCode('var {0} = buffer.readUInt32LE(offset);', val);
            sum = 32;
        } else {
            throw new Error('Currently, bit field sequence longer than 4-bytes is not supported.');
        }
lff5 commented 3 years ago

I set seek(0) as a workaround for multi-bit fields to enforce little-endianness.

var readout_date = new Parser().endianess('little')
  .nest("readout_date", {
    type: new Parser().endianess('little')
      .bit5("day")
      .bit3("year_lower")
      .seek(0) // workaround for Big endianness parser bug
      .bit4("month")
      .bit4("year_upper")
  });
polarstoat commented 1 year ago

I'm still experiencing this issue in v2.2.1. I'm parsing a 16-bit length bit field and was getting super confusing results until I found this issue. @lff5's solution above of adding .seek(0) at the boundary between bytes fixed it for me.

Could this be looked at? It means using the bit fields isn't very reliable for many users.