jimmywarting / StreamSaver.js

StreamSaver writes stream to the filesystem directly asynchronous
https://jimmywarting.github.io/StreamSaver.js/example.html
MIT License
4.04k stars 418 forks source link

How to split, send each part, merge splits into one file? #101

Closed tumatanquang closed 5 years ago

tumatanquang commented 5 years ago

I have a question is it works with Android browsers (eg UC Browser, Google Chrome, ...)? Also, I have a small question like this: For example, I have a .zip file with a download link: https://domain.tld/myfile.zip Size: 1048576 Bytes (1 MB). Now, I want to split it into x parts and then in turn send requests for each range of data when the previous data range has finished download, for example: First, send the request between 0 - x, after the data range is from 0 - x downloaded complete, continue sending the range from x+1 - x+n, ... and finally x+n+1 - 1048576 . After that, proceed to connect all the above downloaded parts into a complete file, ending the download. I find it quite similar to downloading torrents, but it's not a torrent! How do I do it? Thank you.

jimmywarting commented 5 years ago

yea, it's possible to download segments of a file if it supports byte ranges. And if you write something in order. I wrote something similar a long time ago that uses xhr and/or fetch+stream to download segments and then write all pieces to the final pre allocated file with seeking (unordered download - simular to torrent)

But it's perhaps a little off from what you want to do since you need to write them in order but you can copy some logic from this: https://gist.github.com/jimmywarting/e1f2dd369e2098b4881b

I would recommend you to use fetch and the readableStream. when you pipe segments to the writableStream you have the option to prevent closing the writable stream that could be useful to you.

const fileStream = streamSaver.createWriteStream('filename.txt')

for (var i = 0; i < 10; i++) {
  res = await fetch(...)  // download one segment
  await res.pipeTo(fileStream, { preventClose: true }) // write the segment to streamsaver
}

and when everything is done then you need to close the stream


Regarding mobile support: I haven't tested many mobile browser but chrome on android works at least. With the newest addition that solved safari download makes this more compatible with a progressive degration (by not using service worker and instead create a blob and save it using a[download])

other browser that also use the blink engine should also work well also, and many other browser are based on that

tumatanquang commented 5 years ago

I read your article (download multi thread.js) not long ago, and I also have the same question for someone who commented below that article: "the fss.cwd.getFile? any module used? looked up fss, and couldn't find one." Can you explain what is "fss.cwd"? And what does it use to operate? Usually, if only using jquery, "fss.cwd" is unidentify. I can understand that code briefly, however, I think it should be better to "chunkDownload" the maximum number of parts. In short, it would be great if you explained to me what "fss.cwd" in (download multi thread.js) is? Is the code in the comment (see photo) a separate code? Or it will be legendary if you integrate "download multi thread.js" into "StreamSaver.js"! Image: Code

I look forward to your feedback.

jimmywarting commented 5 years ago

fss.cwd was something that came from using filer.js but you don't really need that. Here is the equivalent code:

webkitRequestFileSystem(TEMPORARY, 0, fs => {
  // fs.root === fss.cwd
  // fs.root.getFile() === fss.cwd.getFile()
})

the gist was just for personal use and somewhere to store it for later use. (was't really meant for public use)

It's a blink only thingy...

But if you give me some few minutes i can make an fiddle example that replaces the sandboxed filesystem for the real deal.


That multi threaded download thingy will not be integrated into streamsaver...

tumatanquang commented 5 years ago

Or can you write an example of a function like downloading multi thread.js but using StreamServer.js? For example:

function Downloader (url, maxParts (or chunksize), filename <probably not needed because it can be taken at server>, filesize <optional because it can be taken in Content-Length header>) {
Your example code
}
jimmywarting commented 5 years ago

Ok, so i experimented a bit to get to a similar multithreaded download. was looking at

And came to the conclusion that if you want to play ball with StreamSaver it would be best to solve the thing with the help of actually making another readableStream with a high water mark pressure.

var url = '...'

// Allows us to make 6 simultaneously pulls (request)
var concurrency = new CountQueuingStrategy({ highWaterMark: 5 })

var size = 0
// 5MiB ( 5 * ( highWaterMark + 1) === 30 MiB that can be in memory at once )
var chunkDownload = 1024 * 1024 * 5 
var requested = 0

// Transform stream that resolves a promise and passes the resolved value onwards
var ts = new TransformStream({
  async transform (chunk, ctrl) {
    ctrl.enqueue(await chunk)
  }
}) 

// A ReadableStream that fetch partial responses
var rs = new ReadableStream({
  // First figure out how large the file is
  async start () {
    const res = await fetch(url, { method: 'head' })
    res.text()
    size = parseInt(res.headers.get('content-length'))
  },
  pull (ctrl) {
    // Adv to the next chunk request
    var Range = `bytes=${(requested || -1) + 1}-${requested + chunkDownload}`
    requested += chunkDownload

    var req = fetch(url, { 
      cache: 'no-store', 
      headers: { Range }
    })
      .then(r => r.arrayBuffer())
      .then(ab => new Uint8Array(ab))

    ctrl.enqueue(req)

    if (requested > size) {
      ctrl.close()
    }
  }
}, concurrency).pipeThrough(ts)

then you can do:

rs.pipeTo(streamSaver.createWriteStream('file.zip', { size }))
jimmywarting commented 5 years ago

using pipeTo(dest, { preventClose }) wasn't such a good idea, the other downloads wouldn't fetch any data since there where nothing that was consuming the stream, therefore the request highWaterMark was filled. so it was better to get the arrayBuffer

tumatanquang commented 5 years ago

I tried using the above code and the browser reported an error "Uncaught SyntaxError: Unexpected identifier" on line 22 (you can see in the image). Seems like I was missing something? I added Blob.js, ponyfill.min.js (although I don't know if it's necessary or not)? I have rewritten it here, can you go see it and fix it? line-22

File: https://js.do/code/324482

jimmywarting commented 5 years ago

maybe your browser don't support async/await? i didn't get an error, what browser (version) are you using?

blob.js is not necessary, ponyfill is

tumatanquang commented 5 years ago

I tried using it on Chrome 49 and it failed, I updated it to Chrome 72 and it works fine! You said "ponyfill is" is is is❔???? However, there are a few shortcomings here: I take the download file, for example, Sintel.mp4 of torrent example. It has a size of 129241752 Bytes (123.25 MB). Looking back at MEGA's download method, first, it will send a request to the file to get "content-length", after receiving the "content-length" value, the request will cancel. After that, it will send each request to get each piece of serial data with different sizes (8 MB, 16 MB, 32 MB, ...), after the previous request has been downloaded, it will continue continue to send the next request (instead of sending it at the same time). The example you send me is very great! It is easy to understand, but it is not available on all browsers, especially older browsers (I have not tested it on mobile browsers πŸ˜‚)! I don't think you can do write download method like MEGA (I can't do it, I'm not good at javascript, I only know about PHP) but I want to ask if you used StreamSaver.js in that code? And how to connect all downloaded parts into a complete file and then redirect to the link of that file, then automatically save to the computer? (For browsers that enable "Ask where to save each file before downloading" it will ask to select the location to save before saving to the computer). I corrected it a bit, it will disable the "Download" button as soon as I click on it, but since I don't have the answers to the above questions, adding the download button activation after saving the file the computer still hasn't done! This is the code after I edited it, you can go there and comment (add code if missing) the answer to the above questions and then save them for me? I look forward to you! Code: https://playcode.io/334779

jimmywarting commented 5 years ago

The download felt clunky, saving chunks in batch instead of being piped...

There was one problem with that sintel movie, it didn't expose the Content-Length to foreign origins.

https://jsfiddle.net/08fmukz7/

jimmywarting commented 5 years ago

Think you will get very little improvements with this solution unless the content is available on multiple servers or if the thing you are trying to download throttles the download speed.

many random files i tried seems to be just as fast, HTTP2 & spdy uses some techniques to send you multiple documents in the same pipeline, so making more parallel download won't make much differences

tumatanquang commented 5 years ago

I have tried it, but why the file download link is not "blob:https://domain/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"? https://ibb.co/9Gk30y0 I tried editing to 16MB / chunk and 2 + 1 requests simultaneously. However, at the request for the last download data (x+1 - file size) it continues to send 3 more requests, but all 3 requests return status 500 and make the file download above "canceled"! Code: https://playcode.io/335175?tabs=index.html And the special thing is that the file has appeared since the first data request (0 - x) received enough data and the downloaded size of that file gradually increases with the subsequent completed download requests instead of getting all the data of the file then proceed to merge the downloaded parts into one file and save it to the computer! πŸ˜‚ In addition, Async / Await is only supported on new browsers, can you replace them with Promise? It will support all browsers (I think that)!

jimmywarting commented 5 years ago

but why the file download link is not "blob:https://domain/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"?

b/c it uses service worker to emulate how a server respond back when you want to save a file. It allows you to stream a file to the disc as opposite to URL.createObjectURL(blob) where you need to have all data before you are able to save it


was skimming through the webcrypto api today and found this: webcrypto#73 it's similar to the original idea i had about pipeTo + preventClose. maybe it will work better. (but i don't know - haven't tested it again) or i maybe did it wrong the first time


90% of the worlds population supports async/await https://caniuse.com/#search=async%20await

Most frontend developer use babel to transpile new syntax to support old browsers...

No offence, but i don't feel like spending all my time helping you and write all the code for you. this isn't really an issue about StreamSaver, so perhaps it maybe belongs more to StackOverflow. maybe you will get more/better answer asking there. leave the part out where you (if you do leave the part out where you mentioning StreamSaver and explain to them that you want to get a ReadableStream out of an unordered-limited-sequential-partial-download to a ordered ReadableStream)

Here is an example of how to transform a async/await to a promise

// before: 
const obj = {
  async foo() {
    const res = await fetch(...)
    return 123
  }
}

// after:
const obj = {
  foo() {
    return fetch(...).then(res => {
      return 123
    })
  }
}
tumatanquang commented 5 years ago

I use Babel according to your instructions and get the error: Uncaught ReferenceError: CountQueuingStrategy is not defined. Is there any solution to fix it?

Go back to StreamSaver.js, under "How does it work?" Did you say:

link.href = URL.createObjectUR (stream) //DOES NOT WORK"

that means there is no way to create "blob:https://domain/xxx" links and will always Is it "https://jimmywarting.github.io/StreamSaver.js/mydomain/filename.ext"? Is there any way to customize it? (Link code if you need it: https://playcode.io/335175)

jimmywarting commented 5 years ago

I use Babel according to your instructions and get the error: "Uncaught ReferenceError: CountQueuingStrategy is not defined". Is there any solution to fix it?

There is a way to fix it alright but i don't know how your setup looks. all new browser have CountQueuingStrategy https://developer.mozilla.org/en-US/docs/Web/API/CountQueuingStrategy#Browser_Compatibility otherwise you can get that from ponyfill.CountQueuingStrategy


The error i got from that playground was Uncaught ReferenceError: regeneratorRuntime is not defined maybe you meant to say regeneratorRuntime? if possible use a settings that transform async/await to generators instead. or simply just convert them to promises


links and will always Is it "https://jimmywarting.github.io/StreamSaver.js/mydomain/filename.ext"? Is there any way to customize it?

Yes, but's slightly more complicated you have to host the sw.js + mitm.html yourself on your domain. You also need ssl (https) on your own domain (for service worker to work)

then you have to set streamSaver.mitm = 'https://domain/mitm.html'

the alternative is to wait for WICG/native-filesystem api to ship to all browser where you are able to write a stream to the disk without service workers - when they become available i will start to integrate it to streamSaver.