I worked on a parquet file where decodeRunRepeated was basically supposed to convert a [18, 1] buffer into 274 as the repeated value but yielded 19 instead.
[18,1] is supposed to be interpreted as 18 2^(8 0) + 1 2^(8 1) = 18 + 256 = 274, which would lead to something like this:
value += (cursor.buffer[cursor.offset] << 8*i)
The current code yields the correct result if there is only one byte needed: [18, 0] yields 18 which is expected.
The issue is only visible if the parquet file has some repeated values above 256, as those repeated values will need more than 1 bytes to be encoded, and the current code would yield incorrect values.
I think value << 8 without affectation has no effect. There might be a similar problem in the encoding function but I haven't used it so far:
Hi,
I think there is an issue here: https://github.com/ironSource/parquetjs/blob/07fb2fd8fc03bf2b57243531eaf91f2d60f5e460/lib/codec/rle.js#L114
I worked on a parquet file where
decodeRunRepeated
was basically supposed to convert a[18, 1]
buffer into274
as the repeated value but yielded 19 instead.[18,1]
is supposed to be interpreted as 18 2^(8 0) + 1 2^(8 1) = 18 + 256 = 274, which would lead to something like this:value += (cursor.buffer[cursor.offset] << 8*i)
The current code yields the correct result if there is only one byte needed: [18, 0] yields 18 which is expected.
The issue is only visible if the parquet file has some repeated values above 256, as those repeated values will need more than 1 bytes to be encoded, and the current code would yield incorrect values.
I think
value << 8
without affectation has no effect. There might be a similar problem in the encoding function but I haven't used it so far:https://github.com/ironSource/parquetjs/blob/07fb2fd8fc03bf2b57243531eaf91f2d60f5e460/lib/codec/rle.js#L26