satazor / js-spark-md5

Lightning fast normal and incremental md5 for javascript
Do What The F*ck You Want To Public License
2.46k stars 470 forks source link

Streams #40

Closed jimmywarting closed 7 years ago

jimmywarting commented 7 years ago

Do you know what would be cool? Hashing a large file with streams!

The spec is coming together and just thought "hey it would be cool to use it in md5 spark!"

There is two ways you can do it. Either as a WritableStream or ReadableStream if you would give a ReadableStream to the api then it would be in control and could handle the buffer allocation with BYOB.

However the append() operator is more like a WriteStream so you would need to provide a way to create write stream that is connected in the core

Currently you can get a ReadableStream from a the fetch api in Blink. You can also construct a ReadableStream in Blink now or you could use the web-streams-polyfill

So it would make since to just hand it over to spark in some way I have also created a way to get a ReadableStream from blob/files with Screw-FileReader

One way you could do it is

ws = spark.createWriteStream()
blob.stream().pipeTo(ws).then(() => spark.end())

or just hand over the ReadableStream to spark in some way, cuz of right now fetch ReadableStream don't have pipeTo yet since WriteableStream is not implemented

one possible way would also be to hash it and upload it at the same time given by the example with tee()

satazor commented 7 years ago

Handling streams is a great idea. The way I see it is spark.stream() returning a transform stream that should just consume data from a readable stream via piping. Internally it would call append() for each chunk and end it with end().

See https://github.com/sindresorhus/hasha for an example.

jimmywarting commented 7 years ago

Cool lib time to port ot to Web streams 😉

satazor commented 7 years ago

Closing this for now. Feel free to work on a PR adding streams support :)