kiwix / kiwix-apple

Kiwix for offline access on iOS and macOS
https://apple.kiwix.org
GNU Lesser General Public License v3.0
441 stars 70 forks source link

Streaming uncompressed data (with direct access) #773

Closed rgaudin closed 2 weeks ago

rgaudin commented 3 weeks ago

What we do on Android for videos doesn't seem possible directly. It's a combination of tricks that are missing on iOS (at least I did not find them easily). First the AssetFileDescriptor allows to mimic an asset (so a clearly bound piece of data) from a filepath, an offset and a size. Then, there is the integration between Android MediaPlayer and the WebChromeClient renderer (with some hacks to transition smoothly).

I found some other people inquiring for AVPlayer and WKWebvierw integration but without answer. Maybe the wording or the keywords were wrong…

Despite this, I had some success with an easy tweak to the code: streaming the data to the client. Currently, we request the requested content from libzim, store that in a variable and put that on the response. On the renderer side, this content is read (sometimes not completely) and used.

This is OK for relatively small content, or maybe even large ones that are consumed entirely by the client (RAM will be required) but in the video use case, we know that the client wont even try to read the whole thing and will display it piece by piece.

Doing so is as easy as repeatedly calling urlSchemeTask.didReceive(additionalData)

let response = HTTPURLResponse(
    url: url,
    statusCode: statusCode,
    httpVersion: "HTTP/1.1",
    headerFields: headers)
urlSchemeTask.didReceive(response!)

...

for i in 1...nbStreams {
    partEnd = partStart + streamThreshold
    content = ZimFileService.shared.getURLContent(url: url, start: partStart, end: partEnd)
    urlSchemeTask.didReceive(content!.data)
    partStart = partEnd
}
if (finalBytes > 0) {
    content = ZimFileService.shared.getURLContent(url: url, start: partStart, end: partStart + finalBytes)
    urlSchemeTask.didReceive(content!.data)
}
urlSchemeTask.didFinish()

This is very efficient in keeping RAM usage under control on very large videos.

I don't know exactly how it works internally but simply looping on writing 2MB chunks does the trick so I suppose renderer-reading is synced somehow.


Another improvement that is independent from this is reading video files directly from the filesystem. Leveraging item.getDirectAccessInformation() which returns the ZIM path on the fs and the offset at which the content start, we can easily read the video data from it (we already know its size).

WARN ⚠️: We can't pass the filehandle directly to the webview because FileHandle has no size parameter so it would not stop reading at the end of the content. Above streaming experiment shows we might not need this but we could still reimplement a FileHandle that stops after a defined size.

WARN ⚠️: getDirectAccessInformation only works on raw (uncompressed) entries which is something that's decided at ZIM-write time.

In my experiment, I used it on non-text/ mimetypes because I know that currently libzim only compresses those types. Downloads (un-handled formats as you call them) would similarily from it I suppose.

In a real implementation, we might look at whether entry is compressed (is libzim telling us this?) or using a fallback in case the function returns empty data (it doesn't fail…).

On whether we should use it or not, I don't know.

@mgautierfr, what do you think of using getDirectAccessInformation() and reading from filesystem instead of reading from the libzim? Is is worth the separate implementation code? What about other non-compressed content like PDF?

kelson42 commented 3 weeks ago

@BPerlakiH Any chance you can implement this for video files and get a chance to fix #744?

BPerlakiH commented 3 weeks ago

@kelson42 @rgaudin I think this is a very good direction, I also had a look at the:

urlSchemeTask.didReceive(data)

to be used on partial chunks, I just need to wrap that into some nicer error handling (as theoretically reading any given chunk can fail).

I also had a look at AVPlayer earlier, which can be started with AVAsset/AVPlayerItem. Unfortunately it does not support webm directly at this stage. It's also possible to have our own AVAssetReader but it won't go close enough to file reading, so I couldn't find a way toinject our ZIM file reading mechanism somewhere "in between".

I am setting up a PR for this reading optimisation as a standalone improvement for video files (without the HTTP range requests).

mgautierfr commented 3 weeks ago

@mgautierfr, what do you think of using getDirectAccessInformation() and reading from filesystem instead of reading from the libzim? Is is worth the separate implementation code?

I can't really answer about technicall difficulties about implementing that with "apple technologies". But getDirectAccessInformation is here to allow user code to bypass libzim and do direct reading of the content by reopening the file, seek and read (mmap is also a solution) So I would say yes.

What about other non-compressed content like PDF?

getDirectAccessInformation works equally for any non-compressed content (if we content is not split between two file parts). I not sure it worth it as pdf content is pretty small compared to video but it would work too.

BPerlakiH commented 2 weeks ago

@mgautierfr I have found an issue related to this in libzim 9.2.0, please have a look if you can re-create it: https://github.com/openzim/libzim/issues/886

kelson42 commented 2 weeks ago

@BPerlakiH I though we decided to make the read operation directly without using the libzim?!

BPerlakiH commented 2 weeks ago

I've created a PR for this, currently it is in draft but can be tested, and reviewed, to see if it makes sense: https://github.com/kiwix/kiwix-apple/pull/778

BPerlakiH commented 2 weeks ago

As discussed I am narrowing down this issue to uncompressed data (with direct access), the follow up ticket for compressed data is here: https://github.com/kiwix/kiwix-apple/issues/784