Closed vjustin closed 7 years ago
I think there's room for improvement, but it's not immediate. Is a process of change + bench, change + bench. Feel free to try to do some experiments and report back. I don't have the time to do it.
For the ArrayBuffer mode (FileReader.readAsArrayBuffer), why the append method needs to do the toutf8 converting with if(/[u0080-uffff]/.test(str)) { unescape(encodeURIComponent(str)) }. If I understand correctly, this will convert a byte (which is not in the range of the readable character) into something else. This looks consuming a lot of cpu. Can the "appendBinary" method be called directly instead of this "append" method? However, the hash calculating will call substring method. Will the hash rely on that the input is string? Thanks ahead for shedding some lights on this.
For the toutf8 method, the comment means the input will be converted if it is string. However, the implementation as shown below is to convert the byte array into something else if it is NOT string.
if (/[\u0080-\uFFFF]/.test(str)) {
str = unescape(encodeURIComponent(str));
}
If you are using FileReader.readAsArrayBuffer, you should be using https://github.com/satazor/SparkMD5#sparkmd5arraybufferappendarr which does not do any conversion.
Thank you for the explanation. I wrongly missed the ArrayBuffer.prototype.append method.
Performance was improved by #41. Feel free to reopen this if necessary.
Hi ,
Right now if i calculate the MD5 value using spark its taking 1 minute for 2GB file.
if i try to introduce multi threading with some 10 seconds time delay between each thread to read the file chunk and calculate the md5 append onloadend also its taking the same time. If i reduce the delay into 5 secs or less its calculating wrong MD5 values.
I wonder is there anyway to speedup the MD5 calculation for the bigger file size?
Thanks a lot.
Regards, Viji