openzim / mwoffliner

Mediawiki scraper: all your wiki articles in one highly compressed ZIM file
https://www.npmjs.com/package/mwoffliner
GNU General Public License v3.0
289 stars 73 forks source link

Navbox Missing #1493

Open ghost opened 3 years ago

ghost commented 3 years ago

@metal450 commented on Jul 5, 2021, 4:00 PM UTC:

I'm not sure if this is the most appropriate place to be reporting this, but I just updated my Wikipedia maxi content or the first time since wikipedia_en_all_novid_2018-10.zim (which also seemed to necessitate updating kiwix-serve - the previous build I'd been using would get stuck while trying to load wikivoyage_en_all_maxi_2021-06). However, now it seems that the 'related topics' tables that appear at the bottom of many articles are all missing. They are not missing from wikipedia itself, so this is definitely not just a function of changing content. Here's an example:

2018 data: 2021-07-05 08 52 55

vs now (the whole table at the bottom is missing): 2021-07-05 08 52 43

These tables were very useful. Were they deliberately removed, or is it a bug?

Addendum: It looks like other tables are missing, too. Here's another example:

2018: 2021-07-05 09 02 34

vs now - the whole 'Numeral Systems' table is gone. It is not gone on wikipedia itself: 2021-07-05 09 02 49

This issue was moved by kelson42 from kiwix/kiwix-tools#466.

ghost commented 3 years ago

@mgautierfr commented on Jul 6, 2021, 9:49 AM UTC:

This is more a issue for https://github.com/openzim/mwoffliner But I let @kelson42 answer you, this may be a wanted behavior or something else.


which also seemed to necessitate updating kiwix-serve - the previous build I'd been using would get stuck while trying to load wikivoyage_en_all_maxi_2021-06)

Except for bugs (probably already fixed) or the new compression algorithm zstd used, this should not be the case. What was your previous version of kiwix-serve ?

metal450 commented 3 years ago

Except for bugs (probably already fixed) or the new compression algorithm zstd used, this should not be the case. What was your previous version of kiwix-serve ?

I still have the old binaries, but was not able to determine the version. kiwix-serve -v doesn't output any version information.

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.

Inbefortus commented 2 years ago

@metal450 The good news is that they are there, just hidden.

Simply do the following:

It looks like other tables are missing, too. Here's another example

Not the case in 2021-12 English Wikipedia:

Screenshot_20211223-104534_Samsung Internet

Jaifroid commented 2 years ago

Just to note that I would suggest using https://pwa.kiwix.org/ instead of kiwix.github.io because the former has stable, tested releases, and the latter is a development server, and can have buggy code that changes rapidly. :-)

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.

kelson42 commented 1 year ago

Maybe using the new mobile end-point will finally fix that. Otherwise we will probably close it as WONTFIX as this is the HTML delivered by the backend.

Jaifroid commented 1 year ago

A new mobile endpoint won't fix this, because the default Wikipedia mobile style hides navboxes by default in order to make mobile browsing clutter-free. There are some quite heated debates about this on Wikipedia discussion pages.

At base, the issue is that mwoffliner only includes the mobile CSS, irrespective of whether the user is browsing on a desktop/laptop or mobile device. There are two solutions:

The second option is easiest, but it probably needs implementing at reader-level, not scraper-level. The first option is a bit more satisfying, but requires a bit of processing of the page in order to re-arrange some page elements that have a different position when viewed in the mobile vs the desktop modes. The first option could be done as part of the proposed JS API.

mgautierfr commented 1 year ago

@metal450 The good news is that they are there, just hidden.

It depends of the point of view 🙂 . Including hidden content is a lost of memory. It would be better that hidden contents are in fact removed and not hidden. (But here we want to show hidden content so ...).

The second option is easiest, but it probably needs implementing at reader-level, not scraper-level. The first option is a bit more satisfying, but requires a bit of processing of the page in order to re-arrange some page elements that have a different position when viewed in the mobile vs the desktop modes. The first option could be done as part of the proposed JS API

Care must be taken when designing a solution that readers may read other zim file than wikipedia. (And you cannot ask all reader to fully/properly implement a DOM manipulation). We should not design something too specific to one kind of zim file or to one reader.

Jaifroid commented 1 year ago

Including hidden content is a lost of memory. It would be better that hidden contents are in fact removed and not hidden. (But here we want to show hidden content so ...).

The thing is that these elements are in the HTML that is served by Wiikipedia, but they are purposefully hidden by the mobile stylesheets, in order to save space on mobile screens. However, many of our users are accessing Wikimedia ZIMs on desktop or tablet devices, and if we want to mirror what Wikipedia provides, then we need a way to show this content on larger screens. The "best" solution is to offer some automatic switching of styles according to screen size (but possibly also make it optional). As mwOffliner controls and bundles the stylesheets, this could be done as part of the JS API within Mediawiki ZIMs, thus overcoming the objection that the reader is manipulating the DOM to offer such options.

I do offer such an option (switching between mobile and desktop stylesheets) in KJSWL (for a number of years now). The reader detects that it is dealing with a Wikimedia ZIM, and avoids doing this in relation to any other ZIM type (of course).

mgautierfr commented 1 year ago

Yes, I totally agree. It was a "joke" about the "good news". Good news depends on the point of view :) But I agree with the purpose of this issue, no problem.

My real comment was about taking care of not being too much wikipedia-centric in the designed solution.

uriesk commented 1 year ago

Responsive design, so showing them on certain resolutions, is not an option?

Jaifroid commented 1 year ago

Responsive design, so showing them on certain resolutions, is not an option?

Definitely an option. The slight issue is that Wikipedia mobile stylesheets are not responsive with regard to these hidden elements. You either must choose mobile, in which case they are hidden, or you must choose desktop styles, in which case they are displayed. So it is not done responsively. That complicates designing a responsive stylesheet, because it would have to be a custom design.

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.