kiwix-serve: support passing socket as stdin (inetd-style)

opk12 commented 1 year ago

Traditionally on *nix, a service manager spawns the daemon and duplicates the socket to its standard input and standard output. The socket can even be AF_UNIX, bridged to the network interface by the service manager.

This approach is common nowadays (and supported by systemd), but it's called inetd-style because of historical reasons. More info at systemd "Converting inetd services".

Why it's better than binding to the network interface directly:

The system service manager centralizes the interface and port mappings for all daemons, so the config is cleaner.
TLS, logging, access control, and proxies can be added between the daemon and the network interface.
Containerizing the daemon into its own network and IPC namespaces, to block access to the network and to other daemons, is the standard approach on Linux nowadays, to protect from a compromised daemon. This requires ugly workarounds if the daemon wants to bind to the network interface directly.

kelson42 commented 1 year ago

@rgaudin @mgautierfr @veloman-yunkan You validate the feature request?

mgautierfr commented 1 year ago

It seems good to me. And it may be also (as a bonus) a good way to test the server as we would have to run it on different port of things like that. (But not sure if our http client can do it)

rgaudin commented 1 year ago

Thanks for the link ; I've recently experimented with this and found it under-documented and somewhat difficult.

The main question would be which mode(s) to support: single server that gets instantiated on first request or per-connection instances.

From my limited understanding of how libkiwix works, per-connection instances would be terrible. I had difficulties with the Singleton mode (couldn't get a hold on the initial request) but I wasn't ready to invest much time in this and relied on third parties.

I believe apache has a module that allows this so it might be a useful implementation sample.

opk12 commented 1 year ago

Generally a (modern) daemon does the multiplexing, so that initialization only happens once. This corresponds to Accept=false in systemd and is advised in systemd.socket in the section Accept=. Multiple daemon instances can still be started from the outside if necessary, as in this example for QGis Server.

veloman-yunkan commented 1 year ago

If kiwix-serve gives up binding on the network interface directly, I don't see how it can serve concurrent requests efficiently (i.e. having a single shared cache of open ZIM files and search sessions). @opk12 Do you mean supporting operation via stdin/stdout as an option?

I think that my response somewhat repeats @rgaudin's comment above yet I thought that the concern better be stated straightforwardly.

opk12 commented 1 year ago

That the format spec is free to be painted on the walls does not change the vendor lock-in effects if I can't access zim with favorite third-party tools; or I must use a web server that parses downloaded material, but also requires network access; or have to wait ages because compliance with the big redistributors is not a top priority (no LTS branch for Debian, app non-freeness caused by non-free build deps for F-droid) . So the very general context is adhering to proven, widespread, basic design patterns and integrating into the free software community / ecosystem, in favor of the technical or privacy-minded WMF projects user.

This specific feature request just changes where kiwix takes the socket from. Everyone says so far that it's better that kiwix makes connections from a socket ("single server that gets instantiated on first request" from above). A daemon can take a socket and forget where it's taken from, then make connections from it, regardless of how it was constructed. I've amended the title to remove the reference to stdout.

mgautierfr commented 1 year ago

There is few different things in your last comment.

That the format spec is free to be painted on the walls does not change the vendor lock-in effects if I can't access zim with favorite third-party tools; or I must use a web server that parses downloaded material, but also requires network access; or have to wait ages because compliance with the big redistributors is not a top priority (no LTS branch for Debian, app non-freeness caused by non-free build deps for F-droid) . So the very general context is adhering to proven, widespread, basic design patterns and integrating into the free software community / ecosystem, in favor of the technical or privacy-minded WMF projects user.

This sound like a rant because we don't do enough. Guest what, this is free software. You can do more than open issue, pull request are welcomed. You want a debian LTS package ? Please do. We had a contributor on debian packaging but he now lack of time. You want a free build on F-droid ? Please do. Everything is free software, just package it.

We are a really small team, not working fulltime on kiwix. We do the best we can do depending on our priorities and our capacities (we are the "only" ones to know the code where a lot of people can do the packaging).

This specific feature request just changes where kiwix takes the socket from. Everyone says so far that it's better that kiwix makes connections from a socket ("single server that gets instantiated on first request" from above). A daemon can take a socket and forget where it's taken from, then make connections from it, regardless of how it was constructed. I've amended the title to remove the reference to stdout.

Here I agree with you. We are just discussing the feature to properly understand it and implement it the correct way.

mgautierfr commented 1 year ago

I've recently experimented with this and found it under-documented and somewhat difficult.

@rgaudin, The articles from Lennart are really intersting. See : https://0pointer.net/blog/projects/socket-activation.html https://0pointer.net/blog/projects/socket-activation2.html https://0pointer.net/blog/projects/inetd.html (already given in this discussion)

The main question would be which mode(s) to support: single server that gets instantiated on first request or per-connection instances.

This is a valid question but it is a question for the administrator, not us. Accepting connection given to us by fd allow us to do both without change in our code (at least to handle the two modes, we have to change code to accept fd)

Administrator just have to configure the service correctly to use the "instantiate of first request" mode if they want something efficient.

If kiwix-serve gives up binding on the network interface directly, I don't see how it can serve concurrent requests efficiently (i.e. having a single shared cache of open ZIM files and search sessions). @opk12 Do you mean supporting operation via stdin/stdout as an option?

@veloman-yunkan The idea is that systemd passes us the INET socket so we have to use it instead of binding to INADDR_ANY or the address given at command line argument. It seems the option in libmicrohttpd is MHD_OPTION_LISTEN_SOCKET

The patch to cups (given in blog's article) http://0pointer.de/public/cups-patch-core.txt show us how we can get the socket from the fd.

opk12 commented 1 year ago

Sorry for the tone. You do too much already.; It's just the mentioned lack of integration.

kelson42 commented 1 year ago

@opk12 Thank you for opening this issue. We all agree that this would be a good ticket to implement. I will work to integrate it in a future milestone, but for now it does not belong to the top priorities. So, you will need a bit of patience ;)

kiwix / kiwix-tools

kiwix-serve: support passing socket as stdin (inetd-style) #610