openzim / nautilus

Turns a collection of documents into a browsable ZIM file
GNU General Public License v3.0
19 stars 14 forks source link

Create ZIM files without namespaces, with new illustrations, etc... #41

Closed Jaifroid closed 1 year ago

Jaifroid commented 1 year ago

The investigation done in #40 revealed that the dependency of the included JS zimwriterfs.js on DOM elements (and specifically on the presence of a favicon) can have unexpected effects. While that issue turned out to be a problem in the reader, it would be good to remove this dependency on the favicon, as @rgaudin stated in #40. By upgrading to the Type 1, aka "no namespace", ZIM type, there would no longer be any need to find the namespace and process it, as the JS currently does, by checking the path of the favicon.

kelson42 commented 1 year ago

Seems indeed a good move forward.

Jaifroid commented 1 year ago

While @kelson42 has broadened the title of this issue, I don't want to lose sight of the need to refactor the code that still depends on DOM elements (in particular the favicon). In the updated no-namespace TED Talks ZIMs, this code was left in place and is still causing issues for me (because a workaround I had implemented for it broke when Type 1 ZIMs were issued). This is exactly the same issue as https://github.com/openzim/ted/issues/139. In this repo, the problematic code is here.

kelson42 commented 1 year ago

@rgaudin Is that a bug? Have a bit of difficulties to follow the history and the goal of that piece of code.

Jaifroid commented 1 year ago

@kelson42 The stages to fix this issue are: 1. Update the scraper to no-namespace; 2. Remove the code that depends on finding a namespace here: https://github.com/openzim/nautilus/blob/master/nautiluszim/templates/zimwriterfs.js#L2 .

Number 2 is basically the same fix as was just done for TED (openzim/ted/issues/139).

rgaudin commented 1 year ago

@rgaudin Is that a bug? Have a bit of difficulties to follow the history and the goal of that piece of code.

No. nautilus never produced no-namespace ZIM.

@Jaifroid had a race condition in #40 that was due to kiwix-js. He did found a bug in TED (https://github.com/openzim/ted/issues/139, fixed) and assumed we'd make the same mistake twice ๐Ÿ˜‰

I plan on upgrading this to no-ns today anyway as we have a new sponsored nautilus ZIM request in the pipe.

Jaifroid commented 1 year ago

and assumed we'd make the same mistake twice ๐Ÿ˜‰

To be fair, I opened this issue before I discovered the TED issue.... But then I thought it was useful to have an issue for the different but closely related scrapers affected... (โยดโ—ก`โ) ๐Ÿ‘

rgaudin commented 1 year ago

@Jaifroid can you confirm https://mirror.download.kiwix.org/zim/other/bayardcuisine_fr_2022-09.zim is OK ?

Jaifroid commented 1 year ago

@rgaudin While I can confirm it fixes this issue (it is a C-namespace ZIM, and it removes zimwriterfs.js), it seems to have a regression with regard to the use of inline JavaScript on the landing page, i.e. it regresses #34. The offending file is C/home.html. Every video is launched through a piece of inline JS, so the Chrome Extension version cannot launch the videos due to the CSP. It shows these errors in console when clicking:

image

The actual JS looks like this:

image

It would be best to replace this with a function in one of the loaded scripts that searches for each document and adds an onclick event listener. This could be dealt with by re-opening #34 and transferring this comment if you want to close this issue.