openzim / warc2zim

Command line tool to convert a file in the WARC format to a file in the ZIM format
https://pypi.org/project/warc2zim/
GNU General Public License v3.0
44 stars 4 forks source link

BCD tables in developer.mozilla.org are broken again #283

Closed benoit74 closed 4 months ago

benoit74 commented 4 months ago

See https://developer.mozilla.org/en-US/docs/Web/HTML/Element/section vs https://dev.library.kiwix.org/viewer#developer.mozilla.org_en_all_2024-05/developer.mozilla.org/en-US/docs/Web/HTML/Element/section

benoit74 commented 4 months ago

So what we get is

image

This is not in the "raw" HTML but injected by MDN JS.

This seems to be an issue in MDN JS code which believes we are rendering on a server and not inside a browser for some reason. It does not even tries to fetch the BCD data which is available inside the ZIM (at http://135.181.181.97:8888/content/devmdn-bcd_2024-05/bcd.developer.mozilla.org/bcd/api/v0/current/html.elements.section.json)

Corresponding code seems to be available at https://github.com/mdn/yari/blob/00daf0f11a77b0fb76ca873f1f4bbc9108d50237/client/src/document/lazy-bcd-table.tsx#L73

I have no clue so far why this JS believes we are rendering in a server and not in a browser. Probably an interaction linked to the injection of wombat?

benoit74 commented 4 months ago

This is however a regression compared to zimit1, where the BCD tables are working. This is a bit weird.

benoit74 commented 4 months ago

The isServer property comes from a useIsServer hook which is following a known pattern in React: https://stackoverflow.com/a/56145960

benoit74 commented 4 months ago

OK, forget about everything else, JS was stopped by an error I missed on the console, and component was hence not rendered. Rest of the JS assumed it was linked to the fact that we are running on a server... I got side-tracked.

image

Real problem is hence that JSON inside HTML is once-again badly rewritten. This is a re-occurence of https://github.com/openzim/warc2zim/issues/181.

Problem is that in <script type=""> found in this document, the type is "application/json" instead of "json" assumed in the code (does it really exists in wild scenario?).

It also makes me realize that we default to "js-classic", while we should rather default to "unknown" and not rewrite at all (there are plenty of script types in the wild, and only "application/javascript", "text/javascript" and empty means vanilla javascript).