openzim / mwoffliner

Mediawiki scraper: all your wiki articles in one highly compressed ZIM file
https://www.npmjs.com/package/mwoffliner
GNU General Public License v3.0
275 stars 72 forks source link

Deprecate tests for details and sections #1866

Open VadimKovalenkoSNF opened 1 year ago

VadimKovalenkoSNF commented 1 year ago

Current tests expect that sections and details will be removed if they have only summary tag inside without empty paragraph inside. The option keepEmptyParagraphs is responsible for removing these elements. The source code of it is:

/* Remove empty paragraphs */
  if (!dump.opts.keepEmptyParagraphs) {
    // Mobile view === details
    // Desktop view === section
    const sections: DominoElement[] = Array.from(parsoidDoc.querySelectorAll('details, section'))
    for (const section of sections) {
      if (
        section.children.length ===
        Array.from(section.children).filter((child: DominoElement) => {
          return child.matches('summary')
        }).length
      ) {
        DU.deleteNode(section)
      }
    }
  }

Consider this HTML is an example:

    <section>
        <summary>Section Summary</summary>
    </section>
    <details>
        <summary>Details Summary</summary>
    </details>
    <section>
        <summary>Section Summary</summary>
        <p>Some content</p>
    </section>
    <details>
        <summary>Details Summary</summary>
        <p>Some content</p>
    </details>

After transformation, the first section and details elements will be removed, as they only contain summary children. So the final output will look like this:

    <section>
        <summary>Section Summary</summary>
        <p>Some content</p>
    </section>
    <details>
        <summary>Details Summary</summary>
        <p>Some content</p>
    </details>

Current Parsoid version doesn't have the support of details tags (see https://phabricator.wikimedia.org/T31118) so keepEmptyParagraphs should be refactored with related unit tests since we expect to use page/html and page/mobile-html for desktop and mobile view respectively.

VadimKovalenkoSNF commented 1 year ago

Sidenote: Page with details and summary example can be found in Kiwix library here - http://library.kiwix.org/content/wikipedia_en_all_maxi/A/Perturbation_theory_(quantum_mechanics) See https://github.com/openzim/mwoffliner/issues/1501 for more details.

VadimKovalenkoSNF commented 1 year ago

Upd: conversion to <details> and <sections> requires mobile-section endpoint working and happens in mwoffliner using res/templates/subsection_wrapper.html swig template.