satazor / js-spark-md5

Lightning fast normal and incremental md5 for javascript
Do What The F*ck You Want To Public License
2.49k stars 467 forks source link

error checksum for files 500 MB + #3

Closed xtchild closed 11 years ago

xtchild commented 12 years ago

I got wrong checksums for all files bigger than 500 MB in FireFox and Chrome.

satazor commented 12 years ago

Humm how are you reading the files? Could you paste your code here or is your code similar to my example in the README?

xtchild commented 12 years ago

I used your demo scripts in /test folder (file_reader.html and file_reader-amd.html) in incremental mode (in normal mode i got memory error alert, in incremental mode i got same checksum in both .html but it is wrong.) Can you test some big file (600+Mb) and compare result hash with any compiled tool or php md5_file()...

satazor commented 12 years ago

Ok i will test it tomorrow at home but I suspect that the browser is not reading the whole file. Can you please check if the number of chunks * chunk size is equal to the file size? If not then it means that i am right.

xtchild commented 12 years ago

For 6.58 GB file founded 3371 chunks in both scripts. It is all processed successfully, about 12 minutes each (very fast for js!) but why checksums are wrong I dont understand (

satazor commented 12 years ago

I am able to reproduce the issue. Give me some time to check whats going on.. maybe some calcs are overflowing which explains why it only happens to large files.

xtchild commented 12 years ago

Ok, thanks!

burgalon commented 11 years ago

Any news on this?

satazor commented 11 years ago

I still didnt have enough time to figure it out. I tried to find the cause but had no sucess. Will try again this week. Any help would be helpful

satazor commented 11 years ago

@burgalon @xtchild this was fixed. I finally managed to spot the issue. It was a 32bit overflow when performing the trailing computation. Can you test it and report back?

Also I managed to get a 10% performance increase. @xtchild can you test the 6gig file again and report the time?

satazor commented 11 years ago

Will reopen if necessary

burgalon commented 11 years ago

Seems to be working. How did you track this bug? How did you debug it? I was looking yesterday for that same overflow but had no idea where to start and how to track this

satazor commented 11 years ago

@burgalon I took a closer look at the specification and compared to what was being done. All bitwise operations defined in the spec are 32bits (its ok if an overflow happens). But in the end, we must produce a final computation based on the total length of the file. And for bigger files, 32bits is not enough and that was the cause of the error.

        // Beware that the final length might not fit in 32 bits so we take care of that
       tmp = n * 8;
       tmp = tmp.toString(16).match(/(.*?)(.{0,8})$/);
       lo = parseInt(tmp[2], 16);
       hi = parseInt(tmp[1], 16) || 0;

       tail[14] = lo;
       tail[15] = hi;

Before was just:

       tail[14] = n * 8;