Is there a room for improve the performance of the MD5 caculation still in SparkMD5?

satazor / js-spark-md5

Lightning fast normal and incremental md5 for javascript

Do What The F*ck You Want To Public License

2.49k stars 467 forks source link

Is there a room for improve the performance of the MD5 caculation still in SparkMD5? #32

Closed vjustin closed 7 years ago

vjustin commented 8 years ago

Hi ,

Right now if i calculate the MD5 value using spark its taking 1 minute for 2GB file.

if i try to introduce multi threading with some 10 seconds time delay between each thread to read the file chunk and calculate the md5 append onloadend also its taking the same time. If i reduce the delay into 5 secs or less its calculating wrong MD5 values.

I wonder is there anyway to speedup the MD5 calculation for the bigger file size?

Thanks a lot.

Regards, Viji

satazor commented 8 years ago

I think there's room for improvement, but it's not immediate. Is a process of change + bench, change + bench. Feel free to try to do some experiments and report back. I don't have the time to do it.

brightInGit commented 8 years ago

For the ArrayBuffer mode (FileReader.readAsArrayBuffer), why the append method needs to do the toutf8 converting with if(/[u0080-uffff]/.test(str)) { unescape(encodeURIComponent(str)) }. If I understand correctly, this will convert a byte (which is not in the range of the readable character) into something else. This looks consuming a lot of cpu. Can the "appendBinary" method be called directly instead of this "append" method? However, the hash calculating will call substring method. Will the hash rely on that the input is string? Thanks ahead for shedding some lights on this.

brightInGit commented 8 years ago

For the toutf8 method, the comment means the input will be converted if it is string. However, the implementation as shown below is to convert the byte array into something else if it is NOT string.
if (/[\u0080-\uFFFF]/.test(str)) { str = unescape(encodeURIComponent(str)); }

satazor commented 8 years ago

If you are using FileReader.readAsArrayBuffer, you should be using https://github.com/satazor/SparkMD5#sparkmd5arraybufferappendarr which does not do any conversion.

brightInGit commented 8 years ago

Thank you for the explanation. I wrongly missed the ArrayBuffer.prototype.append method.

satazor commented 7 years ago

Performance was improved by #41. Feel free to reopen this if necessary.