Incorrect hash on chunks whereas merging into Blob produces correct hash

silverbucket commented 8 years ago

I'm having really weird behavior trying to perform an md5 checksum on data stored in IndexedDB.

All data is stored in IndexedDB at around 1mb chunk size or less.
I create a SparkMD5.ArrayBuffer() object (spark).
When I grab each chunk (in order) out of idb I add it to spark.append()
When finished I run spark.end() and always get and incorrect hash.
I take the same (ordered) array of chunks and create a blob with new Blob(collectedArray)
When I perform a fileReader.readAsArrayBuffer() on the blob, and then perform a hash on the new array buffer, I get the correct md5 checksum.

Here's an example of my code, assuming I've already collected all of the chunks from IndexedDB and placed them, in order, in the collectedArray:

        var spark = new SparkMD5.ArrayBuffer();

        collectedArray.forEach(function (data, i) {
          spark.append(data);
        });

        console.log('MD5: ' + original_md5 + ' GEN: ' + spark.end()); // here, new md5 is incorrect

        var blob = new Blob(collectedArray);

        var fileReader = new FileReader();
        fileReader.onload = function() {
            var spark2 = new SparkMD5.ArrayBuffer();
            var md52 = spark2.append(this.result);
            console.log("NEW MD5: " + spark2.end()); // here the new md5 is correct
        };
        fileReader.readAsArrayBuffer(blob);

I think this is probably some sort of user-error on my part, but have been stumped for a couple days. I'm hoping your expertise in this area might be able to point me in the right direction to how I could even go about debugging this.

It's really important because in the long run I need to perform the incremental md5sum as the data is coming into IndexedDB, so will need to use the append feature and will not have all of the pieces until the very end.

silverbucket commented 8 years ago

And just to cover my bases. I also tried this with a for loop, instead of a forEach on the off chance that somehow the items were not being appended in order:

for (var i = 0, len = collectedArray.length; i < len; i += 1) {
  spark.append(collectedArray[i]);
}

satazor commented 8 years ago

I can't see anything wrong from the usage perspective. Can you create a small reproducible fiddle so I can play around and identify the issue? A zip file would also sufice.

silverbucket commented 8 years ago

Hi @satazor, I created a small demo that performs the following steps:

downloads an image
slices it up
stores it
displays it ...
loads it
displays a second one (from the idb data)
runs append on the array of elements (attached md5 to html)
merges arrays of arraybuffers into one
displays the md5 for that

You can check it out, run a local http server in the directory, and view the index.html. Be sure to open the console as a lot info is contained there.

As you'll see the original md5 is displayed first (static), then the one from the series of appends (incorrect) then another from a merged arraybuffer (correct).

I tried to keep it very simple, give it a shot and let me know what you think. https://github.com/silverbucket/md5example

satazor commented 8 years ago

Thanks @silverbucket will look at it later today, after work.

satazor commented 8 years ago

There was a bug indeed in the SparkMD5.ArrayBuffer but that was fixed in https://github.com/satazor/SparkMD5/commit/9f54482997c207d7fadcbe42dd2cbe82afcd3a78. Your example is using an outdated version of SparkMD5. I've tested your example with 2.0.0 and 1.0.1 versions and it works correctly!

satazor commented 8 years ago

silverbucket commented 8 years ago

Oh, I missed that update. Since I use node as my build environment (with browserify) I cannot simply include a file from the SparkMD5 repository, I must manually copy it into the project whenever I see an update.

Any chance for adding a build step which creates a file that uses the node module system of module.exports ? I know it's not meant to run in node, but taking into account the npm+node build environment, it still has a lot of benefits.

On Mon, Nov 2, 2015 at 1:29 PM André Cruz notifications@github.com wrote:

[image: screen shot 2015-11-02 at 12 27 55] https://cloud.githubusercontent.com/assets/1017236/10881728/4a9b00cc-815d-11e5-89cd-0e17435b7be6.png

— Reply to this email directly or view it on GitHub https://github.com/satazor/SparkMD5/issues/31#issuecomment-153002667.

silverbucket commented 8 years ago

ie. so that there's a third file: node-sparkmd5.js or something?

silverbucket commented 8 years ago

Are you sure you published the latest version(s) of SparkMD5 to npm? I'm still getting v1.0.0

satazor commented 8 years ago

Sorry, it should be published now. Totally forgot.

satazor commented 8 years ago

Regarding the npm/node usage, it should work out of the box, see: https://github.com/satazor/SparkMD5/blob/master/spark-md5.js#L4 and https://github.com/satazor/SparkMD5/blob/master/package.json#L5

silverbucket commented 8 years ago

Ah, right, not sure why I decided that wouldn't work before. Works fine for me now :) Thanks!

db2190 commented 8 years ago

I want to upload a file of 1GB with chunk size of 1MB each and i wish to calculate and validate each chunk with checksum on server, can someone point me in correct direction?

satazor / js-spark-md5

Incorrect hash on chunks whereas merging into Blob produces correct hash #31