regular / unbzip2-stream

streaming unbzip2 implementatio in pure javascript for node and browsers
Other
29 stars 23 forks source link

"rawr i'm a dinosaur" error #10

Closed imrehg closed 7 years ago

imrehg commented 7 years ago

I have a Raspberry Pi disk image, recompressed to bz2, which results in the above dinosaur error. The test image can be obtained from here: https://www.dropbox.com/s/pzqbw4p985gn0xm/2016-05-27-raspbian-jessie.img.bz2?dl=0 (warning: about 1.3GB compressed file size). Error is thrown quite near the end of streaming the fule. Running bzip2 -t on the file does not report any error, using bzip2 1.0.6, ArchLinux x64.

Used the following code to test:

var bz2 = require('unbzip2-stream');
var fs = require('fs');

fs.createReadStream('./2016-05-27-raspbian-jessie.img.bz2').
  pipe(bz2()).
  on('error', function (err) {
    console.log(err); 
  });

Error thrown:

$ node index.js 
{ Error
    at Object.<anonymous> (/tmp/bz2/node_modules/unbzip2-stream/lib/bzip2.js:33:24)
    at Module._compile (module.js:556:32)
    at Object.Module._extensions..js (module.js:565:10)
    at Module.load (module.js:473:32)
    at tryModuleLoad (module.js:432:12)
    at Function.Module._load (module.js:424:3)
    at Module.require (module.js:483:17)
    at require (internal/module.js:20:19)
    at Object.<anonymous> (/tmp/bz2/node_modules/unbzip2-stream/index.js:2:11)
    at Module._compile (module.js:556:32) name: 'Bzip2Error', message: 'rawr i\'m a dinosaur' }
regular commented 7 years ago

which version are you using?

imrehg commented 7 years ago

@regular using 1.0.10.

Earlier versions seem to break on the same file as well, but with the error message below (I tried 1.0.9, and 1.0.0). Not sure if the cause is the same, just putting it here as reference.

TypeError: Cannot read property '0' of undefined
    at f (/tmp/node_modules/unbzip2-stream/lib/bit_iterator.js:24:34)
    at Object.bzip2.decompress (/tmp/node_modules/unbzip2-stream/lib/bzip2.js:272:13)
    at decompressBlock (/tmp/node_modules/unbzip2-stream/index.js:29:28)
    at decompressAndQueue (/tmp/node_modules/unbzip2-stream/index.js:46:20)
    at Stream.write (/tmp/node_modules/unbzip2-stream/index.js:75:36)
    at Stream.stream.write (/tmp/node_modules/through/index.js:26:11)
    at ReadStream.ondata (_stream_readable.js:555:20)
    at emitOne (events.js:96:13)
    at ReadStream.emit (events.js:188:7)
    at readableAddChunk (_stream_readable.js:176:18)
lurch commented 7 years ago

I've spent a bit of time digging into this, this evening...

Turns out that unbzip2-stream got its bzip2.js from https://github.com/s-macke/jor1k , and jor1k in turn got its bzip2.js from https://github.com/antimatter15/bzip2.js

That in turn is "Based on micro-bunzip by Rob Landley", so I found up http://www.landley.net/code/ and downloaded and compiled both version3.0 (micro-bunzip.c) and version4.1 (bunzip-4.1.c). Both reported a "Data error" when trying to uncompress the 2016-05-27-raspbian-jessie.img.bz2 mentioned above. So then I manually applied the first fix mentioned on the landley.net page and recompiled both, and they both still error-ed. But after then also applying the second fix (it took me ages to realise there were two separate fixes!) both micro-bunzip and bunzip-4.1 were able to uncompress your img.bz2 :smile:

However I'm not familiar with bzip2, nor with porting a C algorithm into JavaScript, so can't suggest how the bzip2.js would need to be modified. ... oooh! I did some poking around in other "bzip2 in javascript" projects and found https://github.com/cscott/compressjs/commit/f744c804132cd850e5ef507d95d08f287b5fb290 and https://github.com/cscott/compressjs/commit/e400f5156317bb77cc78eb0d2dbcd460b0087348

@imrehg could you try applying those patches to the bzip2.js in this repo and seeing if that fixes the problem?

@regular It seems like https://github.com/cscott/compressjs/commits/master/lib/Bzip2.js has had much more love and attention than https://github.com/antimatter15/bzip2.js/commits/master/bzip2.js - I wonder if it'd be worth switching? (But I dunno if the APIs and / or licenses are still compatible)

regular commented 7 years ago

thanks for the research, @lurch ! Yes, it is ridiculous, all this code copying. There really should be one canonical bzip2 on npm, so we can all join forces to make it flawless. Anyway, seems like @cscott has done a very good job maintaining compresjs's copy of the bzip2 code.

Judging from what I saw at first glance, it should not be hard to use that version of bzip2.js here as well. It is already using a BitStream abstraction to read the compressed data, much like I do. So, feel free to give it a try! I'd certainly merge if the tests pass.

But there might be an even easier solution for etcher. It is a little bit hidden in the docs, but compressjs does actually support streaming, too.

Quote from the docs:

The second exported method is a function accepting one or two parameters:

cmp.decompressFile = function(input, [output])

The input parameter is as above.

If you omit the second argument, decompressFile will return a Uint8Array, Buffer or JavaScript array with the decompressed data, depending on what your platform supports. For most modern platforms (modern browsers, recent node.js releases) the returned value will be a Uint8Array.

If you provide the second argument, it must be a "stream", implementing the writeByte method.

lurch commented 7 years ago

But there might be an even easier solution for etcher.

ping @jviotti ;)

jviotti commented 7 years ago

Awesome work here! I'll give compressJS a go!

cscott commented 7 years ago

Let me know how you like it. --scott

On Thu, Oct 20, 2016 at 9:17 AM, Juan Cruz Viotti notifications@github.com wrote:

Awesome work here! I'll give compressJS a go!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/regular/unbzip2-stream/issues/10#issuecomment-255154105, or mute the thread https://github.com/notifications/unsubscribe-auth/AAJhsMkswAbvyyuxBXSG5Ik06R8ul5-xks5q15QggaJpZM4KLP4b .

                     ( http://cscott.net/ )
regular commented 7 years ago

I applied the two toybox patches @lurch mentioned above and published unbzip2-stream@1.011 @imrehg @lurch can you confirm that this solves your problems?

lurch commented 7 years ago

Confirmed fixed by unbzip2-stream@1.0.11 :tada:

regular commented 7 years ago

Nice! Thanks again for your research @lurch, wouldn't have happened without it.