Expose (kind of) `InternalServer` in the public API ?

mgautierfr commented 2 years ago

The server feature in libkiwix does several things:

Listen for http request (using libmicrohttpd) and convert it to a RequestContext
Parse the request to understand what user want.
Find the correct books/Archive/Searcher in the library and "resolve" the user request.
Build a Response
Convert the Response to a libmicrohttpd's response and send back the data to the user.

The first and last step is pretty tied to a http server but the other points are pretty common and it is somehow what is made in all applications. In kiwix-desktop, we span some kind of internal server and answer to QWebEngine requests (https://github.com/kiwix/kiwix-desktop/blob/master/src/urlschemehandler.cpp) This is probably what is done in kiwix-android or macos application.

It would be a nice thing to move all the handling in InternalServer as a first class citizen (libmicrohttpd neutral) in libkiwix API. Then we could :

Unit test the "request handler" without launching a temporary server
Create the server on top of this "request handler" (and keep the current testing system here)
Use the "request handler" in kiwix-desktop

Some open questions or points to keep in mind:

Should we enforce a url schema or be configurable (kiwix-desktop uses a different schema than kiwix-serve)
I'm convinced (but you can counter-argument) that we should move to architecture where we serve "raw" content in a zim file and we add chrome/gui around using different technologies (Qt Widgets in kiwix-desktop, iframe/js in kiwix-serve, ....) With this idea, the "request handler" should handle "raw" content (I include search and co) and the "http server" should add the Chrome.

At your comments :)

kelson42 commented 2 years ago

First impression on my side: sounds a good idea, but I have a really superficial understanding of all what this concretly means. Forcing URL schema does not seem a bad idea either. Probably the feedback from @veloman-yunkan here is far more valuable than mine :)

mossroy commented 2 years ago

This looks interesting, but would most probably not be usable in kiwix-js.

In kiwix-js, we do such thing in a ServiceWorker (which traps the user HTTP requests). But a ServiceWorker is very isolated, and notably can not access the ZIM file. So we would not be able to use it as a one-chunk emscripten-compiled C code, that handles the requests from A to Z.

mgautierfr commented 2 years ago

But a ServiceWorker is very isolated, and notably can not access the ZIM file.

How do you respond to the user request then ? You "transfer" the request to something that can access the ZIM file ?

The idea of this issue is to separate the front request handling (parsing the incoming request to know what to do) from the back (doing what is requested). If you transfer the request for the SW (the front) to something that can access zim files (the back), you could transfer something like a kiwix::Request (created in the SW by parsing the HTTP request).

mossroy commented 2 years ago

How do you respond to the user request then ? You "transfer" the request to something that can access the ZIM file ?

Exactly. We transfer it through a MessageChannel to the main thread, and transfer the result back to the ServiceWorker.

The idea of this issue is to separate the front request handling (parsing the incoming request to know what to do) from the back (doing what is requested). If you transfer the request for the SW (the front) to something that can access zim files (the back), you could transfer something like a kiwix::Request (created in the SW by parsing the HTTP request).

Unfortunately not. There are big restrictions on what can be transferred: https://developer.mozilla.org/en-US/docs/Glossary/Transferable_objects

On the other hand, I discover that recent browsers now allow to transfer a https://developer.mozilla.org/en-US/docs/Web/API/ReadableStream , which could be obtained from the File object we have (representing the ZIM file): https://developer.mozilla.org/en-US/docs/Web/API/Blob/stream That might allow us to read the ZIM file from the ServiceWorker (I did not test), but only in recent browsers.

mossroy commented 2 years ago

On kiwix-js, the protocol can not be zim, and the zim_id can not be the host. The links are relative links, and use the same protocol/host as the app itself: http://localhost, moz-extension://extensionid, chrome-extension://extensionid etc.

We currently put the zim_id as the first part of the path. Example (under Chromium extension): chrome-extension://donaljnlmapmngakoipdmehbfcioahhk/wikipedia_fr_all_novid_2018-03.zim/A/Logiciel.html (here, donaljnlmapmngakoipdmehbfcioahhk is the id of the extension)

mgautierfr commented 2 years ago

We currently put the zim_id as the first part of the path. Example (under Chromium extension): chrome-extension://donaljnlmapmngakoipdmehbfcioahhk/wikipedia_fr_all_novid_2018-03.zim/A/Logiciel.html (here, donaljnlmapmngakoipdmehbfcioahhk is the id of the extension)

This is pretty close from what is used in kiwix-serve.

Unfortunately not. There are big restrictions on what can be transferred: developer.mozilla.org/en-US/docs/Glossary/Transferable_objects

You don't have to transfert a kiwix::Request. You can parse the url yourself (or serialize kiwix::Request) and transfert a dictionnary (and it seems it is already what you do)

The main thread would handled the request by :

convert you object/json to kiwix::Request)
use the new api to get the content -> which return a zim::Entry or a error (to be defined)
transfer the content back (the same you already do it)

You will have to handle the request yourself (and you keep the hand on the schema you want to use) but you give to libkiwix all the work to locate the zim file, read the content and so.

It would handle code like:

(And it is only for the "get content" part, libkiwix would also handle random, main page, search for all applications)

This way, application developpers concentrate on the application code (UI, understand what user want, display information to the user, ...) and let libkiwix handle the zims and access to content (and internal cache). It would be the same "API", whatever you are reading one or several zims, getting the content of a specific article or searching in zim files, or get a random article, .... Only the content of the request would change

mossroy commented 2 years ago

That looks really interesting! It's a bit too early for me to say if it's feasible or not in kiwix-js. But it's definitely worth trying.

kelson42 commented 2 years ago

@veloman-yunkan Feel free to say something about the idea :)

veloman-yunkan commented 2 years ago

@kelson42 I took this proposal seriously even though it was formulated on April Fools' Day and it is on my to-think list since then. This ticket came to my mind during one of our conversations at the Hackathon. I will provide feedback next week.

veloman-yunkan commented 2 years ago

To the best of my understanding, this proposal seeks to reduce or eliminate some code duplication across several kiwix reader projects. It absolutely makes sense as long as we consider only the total code size of those projects.

Nevertheless, I would like to list below some considerations that must be taken into account when assessing the usefulness of an implementation of this proposal from the pragmatic point of view.

kiwix-serve has to support multiple simultaneous clients. The other readers are single-user applications.
kiwix-serve serves content over a network connection. The other readers access local content.
kiwix-serve and kiwix-desktop don't care about power usage. Mobile readers should try to not waste the battery.

Above differences between various readers can pose difficulties to coming up with a single large but simple shareable component that can be configured for optimal mode of operation in each case. An alternative might be providing smaller reusable components from which the most optimal implementation can be assembled for every case with minimal or no duplication.

mgautierfr commented 2 years ago

Agree with you. With few comments :)

this proposal seeks to reduce or eliminate some code duplication across several kiwix reader projects

Yes. Especially as it is the purpose of libkiwix

kiwix-serve has to support multiple simultaneous clients. The other readers are single-user applications.

Yes. But I'm not sure how it impacts us. kiwix-desktop allow on user to access several zim files in the same time using different tabs. It is somehow equivalent to several users browsing zim files.

kiwix-serve serves content over a network connection. The other readers access local content.

This is the main idea of the kind of InternelServer. Move the "routing" (Find the correct books/Archive/Searcher in the library and "resolve" the user request.) in something independent of the connection.

kiwix-serve and kiwix-desktop don't care about power usage. Mobile readers should try to not waste the battery.

It is a valid remark. But I don't see what to do here. We already mostly care about cpu and memory usage. As long as we don't have background process running and that everything we do is piloted by the user code, we should not waste battery (at least no more than needed)

Above differences between various readers can pose difficulties to coming up with a single large but simple shareable component that can be configured for optimal mode of operation in each case. An alternative might be providing smaller reusable components from which the most optimal implementation can be assembled for every case with minimal or no duplication.

Yet again I agree. My initial idea is to have:

A kiwix::Router/kiwix::Handler (not sure at all about the name) which, constructed on a kiwix::Library, answers to "request" asked by the user.
A kiwix::Request which represent such request of the user. It would be not connected to a specific network feature (but would be able to express any kind of request we want to handle)
A kiwix::Answer which represent a response to the request. Here also, not related to a network feature. Is some case, the answer may simply be a zim::Entry/zim::Item/zim::SearchResultSet (kiwix::Answer would probably be some kind of enhanced union)
Tools to parse http (libmicrohttp) requests into kiwix::Request and send kiwix::Answer over a http connection.

But I'm open to new way to split those objects.

Jaifroid commented 9 months ago

The other readers are single-user applications.

Very late response to this thread (only just saw it)! First, just to note that Kiwix JS apps (both the PWA and the Browser Extension) support different ZIMs open in different windows (obviously only one user, but different requests to different ZIMs can be made more or less randomly, and even simultaneously), all dealt with by a single Service Worker (we can only have one registered per domain).

Overall response to proposal:

(Skip to end for TLDR!)

I like the creative thinking! However, It's currently unclear to me whether this proposal implies a change to the ZIM format/spec itself, or only to the way requests are made to libzim. If it's the latter, then it's probably do-able from a client perspective. If it's the former (ZIM format change), then I am more sceptical.

In any case, I would urge that we continue to expose a standard URL-based interface alongside new functions. This would be necessary, IMHO, both for Kiwix Serve and for Kiwix JS, with the difference that Kiwix JS is limited in terms of what it can run as its backend: i.e., everything has to be in JS or in ASM/WASM. We can probably do fairly easily a change of lookup type from a string (C/path/requested_resource) to an Object / JSON, though I think we'd need to compile that Object ourselves based on http requests. It's more or less what we do with the experimental libzim support now being added to the Browser Extension (https://github.com/kiwix/kiwix-js/pull/1160).

I would agree with @veloman-yunkan that the CPU / memory usage / supported browser options, are a real consideration. To give an example: the Kiwix JS current backend is fast, whereas the libzim backend when compiled with WASM is pretty slow in comparison and also pretty large (NB, I'm talking about the necessary WASM port, not C++ code, which is clearly much faster). The libzim WASM runs like a dog on Android, even when using fast OPFS storage (and is completely unusable if not using that storage -- slow storage access slows libzim to a crawl, as it's doing a massive amount of I/O when the ZIM is first loaded, see https://github.com/openzim/javascript-libzim/issues/42).

I should point out that the current libzim port is incomplete as it is. For every new method we need to expose to JS, we have to write our own glue in both C++ and in JS (so that they talk to each other). I would need help with that if we're replacing the current file access method, rather than simply adding to it.

TLDR; I guess what I'm saying is that there is a lot of (functional and maintained) "technical debt" invested in the current API. Not just Kiwix JS, but also various Kiwix Tools and ZIM Tools, which might -- depending on the extent of change -- need to have rewrites with what could be a breaking change in access method (??).

The above came out more disheartening than I intended it to be. While I'm cautious, I should stress that if we're talking about basically the same ZIM format with added routing methods, maybe it would be (relatively) easy to support.

kiwix / libkiwix

Expose (kind of) `InternalServer` in the public API ? #740

Overall response to proposal: