kiwix / kiwix-js

Fully portable & lightweight ZIM reader in Javascript
https://www.kiwix.org/
GNU General Public License v3.0
310 stars 135 forks source link

Add cache on CSS contents in serviceworker mode #362

Closed mossroy closed 5 years ago

mossroy commented 6 years ago

It is currently only implemented in jQuery mode.

Jaifroid commented 6 years ago

BTW, in Kiwix-JS-Windows, I now cache the HTML of the last-visited page using the localStorage API. It works really well, and remembers the page between sessions, even in browsers where you have to reload the ZIM. I set it to forget the page when you load a different ZIM of course. While it's not that useful in browser context, because of the need to reload the ZIM, I'm thinking it would be a really simple way of speeding up CSS and even JavaScript extraction, since we could remember the CSS and JavaScript semi-permanently for any given ZIM file, since ZIM filenames are unique.

The API is really easy too. Here's how you store a file in Storage:

https://github.com/kiwix/kiwix-js-windows/commit/33ac6fe5cd88ed177e7c4b0e5c9199bdb66c460d#diff-bca463ed8ccf5e2058dfe307a95e8501R1848

And here's how you get a file from Storage:

https://github.com/kiwix/kiwix-js-windows/commit/33ac6fe5cd88ed177e7c4b0e5c9199bdb66c460d#diff-bca463ed8ccf5e2058dfe307a95e8501R1735

We could have some kind of time-limit to avoid potentially filling up a storage in case someone accesses dozens and dozens of different ZIMs, but as we're talking about at most 6 CSS and maybe 2 JavaScript files per ZIM, it's not going to break the Storage bank (but we would need to implement a mechanism for clearing out / resetting -- a button on the settings page would do the trick).

Jaifroid commented 6 years ago

Sorry, I should have said, each new page visited overwrites the storage in the case of the code I've implemented, so it's just a way of opening the app at the last visited page "instantaneously". It gives a better user experience -- makes the app feel really snappy. But I think the mechanism could be really useful for CSS and JavaScript caching.

Jaifroid commented 6 years ago

Looking into this issue with regard to #371 , I believe enabling CSS caching in Service Worker mode will require a different caching model.

All files except the article's html file are sent to the Service Worker as BLOB arrays in binary uint8 format, according to this line. The browser then reads the binary file using the content types that are set in this section of the Service Worker code, if I've understood correctly. Therefore, the binary files are never transformed to strings by JavaScript, since the browser is capable of doing that itself with the correct content type/mime type.

Currently our cache is storing the UTF8-encoded strings, but these can't be sent to the Service Worker at least the way it is currently coded. I don't know if in theory it can read a string instead of a binary file?

We currently use the JavaScript Map type to store cached strings, and it only supports key-value strings, not objects. Assuming we don't want to use JSON with its slow encoding/decoding, it seems to me that we have these alternatives:

  1. We could probably store the uint8 arrays as a normal JavaScript object (containing nested objects) instead. If we can do this, then we could implement a low-level cache in the zimArchive.readBinaryFile() function, which checks for the existence of the cached object, and if it exists substitutes it for the call to the decompressor.

  2. We could store the binary arrays as BLOBs using the Blob Constructor, cache only the URL, and then pass the Blob URL to the Service Worker. It should then just fetch these URLs as any other URL (in theory) and then you might be able to do away with the server response patching. Theoretically it should work in Service Worker, HOWEVER, in jQuery mode, unfortunately, FFOS cannot receive a BLOB URL as a the href of a CSS <link> element (I tried this last year, it's blocked by CSP), though I use this method in Kiwix JS Windows.

  3. We could transform the binary to UTF8 string to store in the cache as a string, as we currently do, and then re-convert it to uint8 array to send to the Service Worker. This doesn't seem very elegant..., but if conversion is fast maybe that doesn't matter. See https://stackoverflow.com/questions/34946642/convert-string-to-uint8array-in-javascript for a method. EDIT: this one looks better - https://stackoverflow.com/a/24562290/9727685

Any thoughts on what system would be preferable?

Jaifroid commented 6 years ago

Looking into the Fetch API, I see that the Response body can be any of the following types:

Both requests and responses may contain body data. A body is an instance of any of the following types:

  • ArrayBuffer
  • ArrayBufferView (Uint8Array and friends)
  • Blob/File
  • String
  • URLSearchParams
  • FormData

From: https://developer.mozilla.org/en-US/docs/Web/API/Fetch_API/Using_Fetch

So it does look as if we could respond with a UTF8-encoded string to the Service Worker. instead of the Uint8Array we currently respond with. This means we could keep the current caching system in the outer app and use it to provide string-based responses to the Service Worker. If so, it would be relatively simple to implement.

mossroy commented 6 years ago

Before trying to implement a cache ourself, It would be worth trying to use the built-in cache feature of browsers, instead. Ideally, if we put the right headers in the HTTP response (sent by the service-worker), the browser would keep the content in its regular cache. And we would let the browser handle it properly (expiration, maximum size etc) as it does for regular web pages : it would certainly be more efficient than a custom implementation. And it could work exactly the same for CSS, javascript, images etc.

Regarding expiration, a first approach would be to set a timeout (with a max-age in https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Cache-Control). But we would have no way to reset the cache if the user switches to another ZIM file. So it might be better to use for example https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/ETag or https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Last-Modified and keep inside the ServiceWorker the ETags or last-modified values of each resource we send (inside a javascript Map for example, at least for a beginning. But we might use other ways if we want to keep them after closing/re-opening the app)

In any case, it's just an idea : I did not start testing it.

Jaifroid commented 6 years ago

I agree that the above solution sounds optimal. I don't think I have enough knowledge of HTTP headers to have a go at enabling it, so I'll leave this one to you, @mossroy !

However, I've experimented with sending cached UTF8 strings to the Service Worker, and I can report that it is equally happy receiving data in that format as it is with uint8 binary arrays. I thought it worth documenting this, as it answers my own post above.

mossroy commented 6 years ago

I worked on using the ETag and If-None-Match HTTP headers to make the browser use its built-in cache. But I did not manage to make it work so far, because I can not get the If-None-Match request header in the ServiceWorker. It looks like the event.request.header object does not have all the headers, but only a few of them. I've created https://stackoverflow.com/questions/51696766/how-to-make-browsers-pass-all-http-headers-to-a-serviceworker in the hope to have some help. The code of my initial (non-working) tests is in branch use-browser-cache-in-serviceworker-mode

mossroy commented 6 years ago

I've created #411 as a separate issue, regarding using the HTTP browser cache, to make it more easy to have some help on it.

Jaifroid commented 6 years ago

[I moved this comment from #411 , where I'd accidentally posted it first.]

I don't have the knowledge to contribute to an httpd solution, though it certainly sounds like the best way to do it if it works with localhost. I wonder if some browsers disable caching for localhost?

Without wishing at all to distract you from doing it this way, I did develop back in May (intending to use this cleaner code to replace my current filesystem caching in Kiwix JS Windows) a caching solution that uses indexeddb, localStorage or memory cache, according to the capabilities of the browser. It can persist (if the user wishes) the cache across browsing sessions (if the browser can use indexeddb or localStorage). I applied it to Service Worker, but don't know if I'm catching Service Worker requests the right way. Maybe something like my solution could be used for jQuery mode, alongside an httpd version for Service Worker?

For what it's worth, the code is in https://github.com/kiwix/kiwix-js/tree/Persist-cache-in-localStorage , and a screenshot of the caching controls I added is below. I've rebased this on master so it uses the latest xzdec.

image

mossroy commented 6 years ago

As there seems to be different technical possibilities to add cache for the ServiceWorker mode, I opened new issues for each of them :

I think we need to implement a generic cache, that could handle any type of content (and not focus only on CSS)

mossroy commented 6 years ago

I close this issue, as it should be replaced by one of the 3 mentioned above (or maybe canceled if emscripten compilation of libzim is fast enough)

mossroy commented 5 years ago

(it looks like I forgot to close this issue when I said I would do it)