openzim / warc2zim

Command line tool to convert a file in the WARC format to a file in the ZIM format
https://pypi.org/project/warc2zim/
GNU General Public License v3.0
40 stars 5 forks source link

Fix support of non-GET (POST, PUT, ...) requests rewriting #308

Open benoit74 opened 2 weeks ago

benoit74 commented 2 weeks ago

Currently, non-GET (POST, PUT, ...) requests returning an HTML document are supposed to work but they are not tested at all.

It is supposed to work based on what has been transferred from wabac.js, but not tested.

We need to :

rgaudin commented 2 weeks ago

I'd like to stress that it sorta work by accident because most readers simply ignore the HTTP method of the request and return content from the ZIM using the request's path.

At the moment, it is not harmonized. For instance apple reader ignore method so even DELETE would work but kiwix-serve (libkiwix) only replies to GET, HEAD and POST (so PUT doesn't work there).

I advocated for reader implementation guidelines but this was considered superfluous so I think this ticket will need to map the need and create upstream tickets on all concerned readers.

benoit74 commented 2 weeks ago

I strongly agree. We should probably wait for a real usecase which would help make the issue way more understandable / less virtual.

Maybe we should even split this issue, first considering only POST, HEAD and PUT (most realistic to be supported) and then DELETE (not sure how we could support this).

rgaudin commented 2 weeks ago

I agree we should focus on use cases but for the sake of the argument, I don't see any difference between a POST and a DELETE. In both case, the source website probably managed data and returned something (even a 204) and as far as warc2zim is concerned, it's a request and a response. The fact that we loos any dynamic stuff is already acted even for GET with querystrings