kiwix / kiwix-tools

Command line Kiwix tools: kiwix-serve, kiwix-manage, ...
https://download.kiwix.org/release/kiwix-tools/
GNU General Public License v3.0
408 stars 79 forks source link

Public vs private kiwix-serve's endpoints #593

Closed mgautierfr closed 1 year ago

mgautierfr commented 1 year ago

This issue is a following of the discussion started in #586

586 add a documentation for the endpoints of kiwix-serve (endpoints implemented in libkiwix)

I (@mgautierfr) have raised the question of documenting endpoints that can be considered as private.

There is two orthogonal questions:

For the second question, I have identified the following endpoints (open to question):

mgautierfr commented 1 year ago

Following the @kelson42's comment https://github.com/kiwix/kiwix-tools/pull/586#issuecomment-1365933340, here my point of view:

We could then discuss if:

  • We want to keep them and make public
  • Improve them and then make public
  • Remove them

These 3 options seems to me good enough (but maybe the future discussion will show that I'm wrong).

There are two way of seeing kiwix-serve and its viewer.

The first one (my preference) is to see kiwix-serve serving two purposes :

In this case we must have a public API well documented (this is the purpose of the second point) but we also need some "discussion channels" between the backend and the frontend that are totally private. Those private endpoints don't have to be made public, don't have to be (publicly) documented and we don't want to remove them. In every languages, we have private function/method. Not all functions are part of a public API (in fact, most functions are not part of a public API). The problem here is that endpoints are always accessible by default (and somehow "public") but it is a implementation details, no a expected publicity. (We could protect those private endpoints with some kind of csrf token. It is somehow overkill and I don't argue for this but it would still be logically valid and we see that those endpoints are really private).

The second one is to see kiwix-serve only as a API provider and the viewer only as user of the API among other ones. In this case, indeed, all endpoints must be (made) public and documented or removed. But as for other languages, public API must be taken with care. Every change must be audited for incompatible change. Version of kiwix-serve must be properly update. We potentially have to delay change or add compatibility layer for a while and so... I don't want to change the major version of libkiwix because we have added a "smart" change in /content on the server side or because we have fixed a encoding problem in /suggest or removed a icon in /skin. And it would be even more difficult as endpoints are implemented in libkiwix and end user binary/documentation is in kiwix-tools.

rgaudin commented 1 year ago

Thank you ; to ease the discussion here, I am copy-pasting my comment from there.


I don't have a strong opinion on whether we should have unsupported endpoints or not. I can see how it gives us flexibility but I can also see how we could allow ourselves to break API at any version, given we document changes and respect semantic versioning.

I do have one on documentation though ; as not documenting all the endpoints (ie. hiding somes) would not solve my use cases. I would definitely prefer documented-yet-marked-unsupported ones.

Documenting means being transparent and transparency helps integration and maintenance. Kiwix-serve is critical to a number of our products and other organization ones. By documenting an endpoint (even as unsupported), we acknowledge that it exists and can mention its change or removal in Changelog and that will be beneficial to all.

Hiding it won't prevent users from using it because they might find it anyway because they need something that's not in the public API… but it will break without notice and generate frustration and will discourage them from giving feedback.


An additional note would be to remind that most (if not all) kiwix-serve integrations that we are aware of uses our reader (/content now) and essentially rewrite/tweak the books-browsing (our new homepage). This may not be the way to do it and it may change in the future (homepage improved recently) but that's how it currently is.

mgautierfr commented 1 year ago

An additional note would be to remind that most (if not all) kiwix-serve integrations that we are aware of uses our reader (/content now) and essentially rewrite/tweak the books-browsing (our new homepage). This may not be the way to do it and it may change in the future (homepage improved recently) but that's how it currently is.

This is a interesting point. For now we have two endpoints to access content from a zim file : /raw/content and /content.

But what is the "usable form" is not defined. The fact that /content is really closed to /raw/content comes from a (long time) untold assertion : The content in the zim file must be directly usable in a web browser (assertion broken with warc2zim)

So for all non warc2zim files, kiwix-serve integration could/should use /raw/content instead of /content. For warc2zim the question is still open as no endpoint provide content directly usable. But the embedded replayer is using the raw content (with raw mimetype) and new replayer should logically use the raw/content endpoint.

There are few issues about what is served on content. Some of them tend to say we should keep /content internal, others are more with the idea of an api endpoint (which we can decide to move to another endpoint):

All of this make me think that 1. we should answer all those questions about the purpose and privacy of /content; 2. until then, we should keep it as private.


Please note that I differentiate private than hidden. Private endpoint may be hidden or not (and it can depend of the actual endpoint)


We can also document everything but mark everything as not stable. It would provide more information for our users but without locking us to the current API that has been constructed over time and never correctly designed from scratch.

rgaudin commented 1 year ago

But what is the "usable form" is not defined. The fact that /content is really closed to /raw/content comes from a (long time) untold assertion : The content in the zim file must be directly usable in a web browser (assertion broken with warc2zim)

So for all non warc2zim files, kiwix-serve integration could/should use /raw/content instead of /content. For warc2zim the question is still open as no endpoint provide content directly usable. But the embedded replayer is using the raw content (with raw mimetype) and new replayer should logically use the raw/content endpoint.

This is not exactly right. I'm not entirely sure what you meant by directly usable in a web browser but:

The main difference with warc2zim is that it requires recent browser features that are not available on all webview platforms and that have additional (and not completely standardized) requirement such as HTTPS | localhost. Other ZIM files include content with specific browser features too (<video />). It's just that SW constraint not settled and go beyond a platform upgrade as those intersect with infrastructure/deployment.

But… we may follow the JS-API road and move the subreader to the reader, definitely breaking that untold promise.


Please note that none of the above matters regarding the current discussion. I just wanted to clarify this as this will fuel into another discussion in the future.

mgautierfr commented 1 year ago

I agree. My point was not to discuss about warc2zim but add an argument about longtime untold (and not really specified) requirements. directly usable in a web browser was one of them. warc2zim somehow breaks it in a way it impacts us a lot but it is not related to warc2zim only.

But to answer your question, I would say that directly usable in web browser means : a web view and a way (http only ?) to put (static) content in the webview. Content coming directly from the zim file.

warc2zim files is breaking the requirement as now we need a browser supporting service worker to change the content. We are now not tied to a webview to display the content, but also tied to a web technology to locate, load and adapt this content. (And yes, other zim files have also broken it as they require browsers with javascript). We can see that we simply using a feature of browser never used since now and so we don't break the requirement of a browser, but then it means that ALL our clients but kiwix-serve are not respecting the requirement. But anyway, it is indeed off-topic and it is fuel for another discussion.

However it is related to this in one point: We have build kiwix ecosystem the organic way. We have untold (and something unknown) assertion we rely on without really thinking about it. This is the same for our current endpoints. Documenting things create a de-facto contract on the things and we have to be careful about that. I want to avoid that the organic designed api, not designed to be used externally becomes our new standard. (It doesn't means we should not document things. Untold requirement/specification is definitively not a good situation)

kelson42 commented 1 year ago

After a discussion with @veloman-yunkan @mgautierfr @rgaudin, it has been decided:

Documentation should be update to reflect these decisions above. Tag private end-points as "private" is the very most important to do here.