openzim / mwoffliner

Mediawiki scraper: all your wiki articles in one highly compressed ZIM file
https://www.npmjs.com/package/mwoffliner
GNU General Public License v3.0
275 stars 72 forks source link

infobox formatting misaligned #1598

Open tim-moody opened 2 years ago

tim-moody commented 2 years ago

Under some circumstances mwoffliner seems to generate html for the image in an infobox that causes the text to wrap around the image. See https://github.com/kiwix/kiwix-js-windows/issues/232 image

This happens when the generated code contains <div class="thumbinner" style="width:252px"> (Widths of 252, 262, and 302 have been observed, with 302 sometimes preventing the problem.)

This looks similar to mwoffliner/src/util/saveArticles.ts lines 644ff.

tim-moody commented 2 years ago

A zim can be produced with either of the following snippets for this section of the infobox. Is there a workaround that can force the generation of the first html instead of the second? I suspect that the servers for these two zims were queried with different apis.

from wikipedia_en_medicine_maxi_2022-02/A/Portal_vein which formats properly

<td colspan="2" class="infobox-image">
    <span class="mw-default-size">
        <img src="../I/Gray591.png.webp" decoding="async" data-file-width="491" data-file-height="750"
            data-file-type="bitmap" height="382" width="250" loading="lazy">
    </span>
    <div class="infobox-caption">The <b>portal vein</b> and its tributaries. It is formed by the <a
            href="Superior_mesenteric_vein" title="Superior mesenteric vein">superior mesenteric vein</a>,
        inferior mesenteric vein, and <a href="Splenic_vein" title="Splenic vein">splenic vein</a>.
        <i>Lienal vein</i> is an old term for <i>splenic vein</i>.
    </div>
</td>

from mdwiki_en_all_2022-02/A/Portal_vein which doesn't

<td colspan="2" class="infobox-image">
    <div class="thumb tright">
        <div class="thumbinner" style="width:252px"><img src="../I/Gray591.png.webp" decoding="async"
                data-file-width="491" data-file-height="750" data-file-type="bitmap" height="382"
                width="250" loading="lazy">
            <div class="thumbcaption" style="text-align: left"></div>
        </div>
    </div>
    <div class="infobox-caption">The <b>portal vein</b> and its tributaries. It is formed by the <a
            href="Superior_mesenteric_vein" title="Superior mesenteric vein">superior mesenteric vein</a>,
        inferior mesenteric vein, and <a href="Splenic_vein" title="Splenic vein">splenic vein</a>.
        <i>Lienal vein</i> is an old term for <i>splenic vein</i>.
    </div>
</td>
Jaifroid commented 2 years ago

Just to document here the in-app fix I have included in Kiwix JS Windows/Linux 1.9.3. It simply adds an overriding stylesheet rule like this:

    .content .thumb .thumbinner {
        margin: 0 auto;
        width: 320px !important;
    }

This could be added ininserted_style_mobile.css, or some of the other inserted stylesheets included in the mdwiki ZIM. It corroborates that larger widths for the div make the problem go away. It may be necessary to replace an existing similar stylesheet rule if there is one for thumbinner. I'm not sure, as I was editing the override stylesheets I supply in the app (for transforming between Desktop and Mobile styles).

Of course, it would be better to fix the API reading issue. @WikiDocJames, does mdwiki deliver content via Parsoid API? (I'm sorry if this is a very basic question.)

tim-moody commented 2 years ago

does mdwiki deliver content via Parsoid API?

mdwiki (or the cacher) responds to

https://mdwiki.org/w/api.php?action=visualeditor&mobileformat=html&format=json&paction=parse&page=Portal_vein

in fact the cacher passes that query to EN WP because that is where that page lives and returns the results

tim-moody commented 2 years ago

some of these styles also have a float right which when removed both centers the image and eliminates the superfluous text.

WikiDocJames commented 2 years ago

Okay so the problem appears to be coming from MWOffliner. Can we make a fix there?

Jaifroid commented 2 years ago

@WikiDocJames I personally don't have the ability to work on MWOffliner yet. What I can say is that with the fix we identified in https://github.com/kiwix/kiwix-js-windows/issues/232 (which is also in the WikiMed Electron/UWP/PWA reader), the latest ZIM is working pretty well, and I'll be able to release a version once the March mdwiki ZIM is out. I know this doesn't help the Android version, but the fix is quite easy, so if it can be included in MWOffliner, that will be the best of all worlds.

tim-moody commented 2 years ago

Given that the context of this ticket is a mobile app, with smaller screen, and the fact that the misalignment really only happens on large screens, I'm wondering if the mdwiki_app that was just produced can be promoted to a mobile app.

WikiDocJames commented 2 years ago

Another issue with the mini version is that it contains the whole article not just the leads... Not sure if we have a solution to that yet?

tim-moody commented 2 years ago

Another issue with the mini version is that it contains the whole article not just the leads

I thought that was fixed.

Jaifroid commented 2 years ago

Given that the context of this ticket is a mobile app, with smaller screen, and the fact that the misalignment really only happens on large screens, I'm wondering if the mdwiki_app that was just produced can be promoted to a mobile app.

That's not the case for either the UWP or the Electron WikiMed apps. The Electron in particular is not for mobiles (Electron doesn't run on any mobile OS I'm aware of), it is a PC-targeted WikiMed (Windows and Linux -- see https://kiwix.github.io/kiwix-js-windows/wikimed-electron.html ). The UWP app works on Windows mobile, but that system is defunct, and its main target now is Windows 10/11 devices (any architecture that runs Windows 10/11). These devices are typically larger-screen (tablets and laptops).

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.