openzim / mwoffliner

Mediawiki scraper: all your wiki articles in one highly compressed ZIM file
https://www.npmjs.com/package/mwoffliner
GNU General Public License v3.0
275 stars 72 forks source link

All main `<section>` tags are erroneously hidden with the new mobile html endpoint, and so inaccessible to readers that cannot run JS #2074

Open Jaifroid opened 1 month ago

Jaifroid commented 1 month ago

(Originally reported in https://github.com/openzim/zim-requests/issues/1129#issuecomment-2267448220)

Possibly as a remnant of the removal of details-summary tags for openeing and closing articles (#1915), all main <section> tags have the attribute style="display: none;" (see screenshot at bottom) in ZIMs scraped with the new mobile html endpoint. There is probably a piece of JS included with the ZIM (a script) that is in charge of removing this attribute (it can be seen when loading a page: the lede loads first, and then after a pause the rest of the article is unhidden). However, this logic is the wrong way round: all sections should be visible by default, and then if the reader wishes to hide a section, it can run said script.

This has a serious consequence in readers that block JS from the ZIM, or which cannot in fact run JS from the ZIM: most content in an article becomes inaccessible and cannot be unhidden. This is the case of Restricted Mode (aka JQuery Mode) in Kiwix JS and Kiwix PWA.

For relevant older issues, see https://github.com/openzim/mwoffliner/issues/962, https://github.com/openzim/mwoffliner/issues/838, https://github.com/openzim/mwoffliner/issues/952, (related) https://github.com/openzim/mwoffliner/issues/1033, and https://github.com/openzim/mwoffliner/issues/1915.

image

audiodude commented 1 month ago

@Jaifroid So I should be able to confirm this is not an issue with 1.14 by simply running a scraper and looking at the HTML in the ZIM? If I don't see the display: none it's fixed?

audiodude commented 1 month ago

@Jaifroid So I should be able to confirm this is not an issue with 1.14 by simply running a scraper and looking at the HTML in the ZIM? If I don't see the display: none it's fixed?

@Jaifroid Sorry, I misread the linked issue, this is already confirmed to be happening in a ZIM created with 1.14 yes?

Jaifroid commented 1 month ago

Yes, it was with a Wikivoyage ZIM @kelson42 ran for me yesterday. Let me check the meta data...

Jaifroid commented 1 month ago

Well, the metadata of this 2024-08 English Wikivoyage say 1.13, but it must be using the new endpoint because all the stylesheets are different, and it's a very different ZIM from the last available version (2024-06)... And having the sections all force-hidden was not an issue with all previous Wikivoyage scrapes.

image

Jaifroid commented 1 month ago

Issue also manifesting in the Chrome extension in ServiceWorkerLocal mode (as well as in Restricted mode). In SW-local mode, inline JS is blocked by the Chrome extension APIs, and so the following script in the HTML of each page is blocked:

image

(Not 100% sure that's the issue, but it's the only inline script I've found.) This inline script must be calling a function defined in an attached script, and that function probably traverses the DOM for hidden sections and unhides them. (Remember also that attached scripts won't run in the Browser Extensions in Restricted mode either.)

This behaviour will be in the code coming from the endpoint, I doubt it's been introduced by someone in mwOffliner, though it's possible (and has happened in the past). All that's needed is code in mwOffliner to remove the style display rule on these nodes before the HTML is stored in the ZIM, assuming we always want all sections to be unhidden by default, and assuming we don't want to replicate the old click action which opened and closed sections using details-summary tags.