Closed kelson42 closed 1 year ago
@Jaifroid What does "ability to execute JavaScript in the ZIM" exactly means? You talk about the "ZIM index" but what is that exactly (the welcome page listing the books, the URL index, the title index)? I have to admit that I do not really understand your ticket. Can you please open one ticket per problem? It looks your ticket talks about two problems (on is around JS and an other one is about the URLs of the article)?
@kelson42 Sorry if it's poorly expressed, but I think there is one issue and some speculative solutions to the issue:
noscript
sections in the author pages - they should contain static versions of the links to titles by a given author, and not rely on JavaScript only to access those titles.The proposed solutions are just speculation about how the problem might be worked around, but are not part of the core issue.
@Jaifroid @mossroy I still do not understand how that javascript is different for example from the one in the Wikipedia ZIM files. Can you technically explain it?
I'll leave technical explanations to @mossroy.
Non-technically, in a nutshell, we do not run the JS in Wikimedia articles. But it doesn't matter, as the contents are perfectly accessible without doing that (the only JS in the articles opens and closes headings).
However, in the Gutenberg ZIMs, important pages (author pages) construct their content dynamically. It makes the ZIM inaccessible if we can't do the same, for the very simple reason that the author's surname is not in the title of each book page, so there's no way to search for it. See my answer in https://github.com/openzim/mwoffliner/issues/449 for more details about the difficulty of running JS in the ZIM.
@kelson42 I just posted some explanations on javascript support in https://github.com/openzim/mwoffliner/issues/449#issuecomment-442471173
This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.
I guess this issue could be closed in favour of #145. It's not really the same issue, but I suppose we're now committed to dynamic User Interfaces for ZIM archives with no static fallback. While I think it would be good to have a basic, static UI for accessing ZIM content, I guess that's not realistic now. So I recommend closing as won't fix / not planned and focusing on #145 instead.
I share this conclusion. I'd prefer more scrapers to work without JS but it's hardly realistic. Some of them are just dependent on JS and others, like gutenberg are built around JS to bring in valuable features like author/title search. Having a static fallback would mean extra work which can't be justified without supporting data (that we don't have).
From @Jaifroid on November 19, 2018 21:13
I am not sure if mwoffliner is used to produce Project Gutenberg ZIMs, or some other scraping system, so feel free to move this issue if it is not related to mwoffliner.
Recent Project Gutenberg ZIMs now come with a proprietary interface that requires the ability to execute JavaScript in the ZIM to access any of the texts in a meaningful way. Although texts are still accessible by title in the ZIM index, not enough information is provided to recognize a text by title unless it is very famous ("Don Quijote" is OK...). Books should be listed in the ZIM index by author surname. An entry should look something like:
Cervantes Saavedra, Miguel de - Don Quijote
Currently all we have is
Don Quijote
. If the text isNovelas y cuentos
, there is no way to tell who it's by unless I open it. This is the case at least forgutenberg_es_all_2018-10.zim
.Authors are listed in the index of this ZIM, but alphabetically by first name, which is not very useful. To find "Unamuno" I have to know his first name was "Miguel". However, there is no corresponding author page for Miguel de Unamuno in the ZIM, and the client tries to open a "page" that has to be rendered dynamically in JavaScript, which of course fails in a client that cannot run JavaScript in the ZIM.
So, is it possible to have a more meaningful and usable ZIM index for these files? Ideally, we should also have
noscript
versions of author pages rather than relying on dynamic construction of them.It would be a shame to lock out users on low-end devices. Currently, no Kiwix JS version running in an extension (Chrome or Firefox), for example, or Kiwix JS UWP, can run JS in the ZIM. We have support for JS in the ZIM only for clients that can run from a localhost or other server (not from the file protocol) in Kiwix JS in Service Worker mode, so it is currently very restricted. And it looks very difficult to support JS in the ZIM with mainstream file:// protocol access in Kiwix JS. JavaScript that constructs dynamic pages would need to be patched somehow to hook into the extraction engine, and most (all mainstream) browsers do not support
XMLHttpRequests
when running from the file:// protocol.Copied from original issue: openzim/mwoffliner#445