206 Partial Content - Githubissues

speedmax commented 9 years ago

@markevans Do you have any new thoughts on this topic?

With rise of HTML5 media support like Audio/Video, it becomes increasing difficult to stream the right bytes of output back to the client. If we add "206 Partial Content" support it will effectively allow enable the following features:

File resume download #311
HTML5 media elements (iOS likes to send GET Range requests) #197

Current workaround is memory intensive to return the bytes from a entire object https://gist.github.com/fritzsche/2758045 . I think the new DataStore API can follow IO#read (also Socket, File) which is the most logical place to do.

Dragonfly::DataStore#read(uid, [length [, offset]])

Here is a list of popular DataStore I can find, this shouldn't break any existing datastore because second, third param is always optional.

Dragonfly::FileDataStore
Dragonfly::MemoryDataStore
Dragonfly::S3DataStore
Dragonfly::MongoDataStore

/cc @fritzsche @idyll @khoan to comment

markevans commented 9 years ago

I quite like the idea of the datastore following the IO API.

My only thought is how to handle processing etc. (most processors need the whole file to process it).

Also there are currently a number of ways to get content:

fetch (get from datastore) fetch_file( get from file) fetch_url (get from remote url) generate (use a library to create content)

I'd need to decide how to handle all of these if a range parameter is given

idyll commented 9 years ago

I would think that you wouldn't want a processor to be aware of RANGE. RANGE should be applied the the result of the process. This would allow caching of the result of the processor so that it could serve multiple different RANGE requests. Worst case you process the whole thing each time and then send the range, which at least saves the bandwidth of sending the full file but is no worse than what happens now.

Get the first 100 bytes of this image and then return the thumbnail is completely different from get the thumbnail of this image and then return the first 100 bytes.

speedmax commented 9 years ago

@idyll 100 Percent, Processor shouldn't be aware of RANGE

I believe most of of use cases for partial content is large files (documents and media), it's rare to transform documents and media like audio/video. Down sampling these files on the fly is possible but better leave for a specialised service (background job, Zencoder)

This also allow us to reduce the memory consumption in the Analyse stage to only fetch first few bytes of image, video, audio to get its metadata like what FastImage gem has done.

DataStores: Implement IO#read like should not break any existing code
Analysers: Range aware is optional to analyse data on partial content (FastImage gem)
Processors: transform data so it should work with the whole object (crop, shrink, thumb)

References:

Extract Video Data from first few MB on S3
rstreamor: stream files using request headers (work with any ruby object that respond to #data or #content_type
fastimage: analyse image size/type by fetching as little as needed.

We should get the design direction right before we touch code, yeah?

I think streamor and PR:331 by @idyll has clean code base to kick this off.

khoan commented 9 years ago

changes are not limited to just datastores, the server endpoint will need to be augmented to become aware of more than just job uid.

best bet at the moment is to serve S3 remote url if you're using S3 datastore. Other datastores are unsupported :(

markevans / dragonfly

206 Partial Content #408