kiwix / kiwix-js-pwa

Kiwix JS Offline Browser implemented as a Progressive Web App (PWA), and packaged as Electron, NWJS and UWP apps for Windows and Linux.
https://pwa.kiwix.org
GNU General Public License v3.0
189 stars 32 forks source link

Package new mdwiki ZIM with next WikiMed by Kiwix release #232

Closed kelson42 closed 2 years ago

kelson42 commented 2 years ago

The people behind the WikiMed have decided to oroceed differrently to provide the best medical encyclopedia in English. They have forked Wikipedia and follow their own path. More info about the differences at https://mdwiki.org/wiki/Main_Page#Differences_from_Wikipedia.

Therefore we have started to provide new ZIM files based on mdwiki.org, see latest: http://download.kiwix.org/zim/.hidden/custom_apps/mdwiki_en_all-app_maxi_2022-01.zim

Please use this ZIM files in the future for the WikiMed app instead of the “classical” medicine selection like in the past, even if these ZIM files will comtinued to be provided.

Jaifroid commented 2 years ago

Thanks, @kelson42, I saw that when checking for the last one, and had meant to ask you what your opinion of these ZIMs was. I'm happy to switch to these ZIMs if it is Kiwix org policy, I wonder if some users won't like the move away from core Wikipedia principles. I'm interested to understand more of the background for the decision to fork.

kelson42 commented 2 years ago

@Jaifroid I would recommend to challenge @WikiDocJames who manages the WikiMed project.

Jaifroid commented 2 years ago

Thanks @kelson42 , though it's more a case of being interested to know the reasons, rather than challenging! I'll check it out.

WikiDocJames commented 2 years ago

@Jaifroid English Wikipedia does not permit a significant amount of details our app users have requested. For example drug doses are not permitted. MDWiki has added these to more than a thousand articles. WP does not allow "how to advice", we do. EN WP is against video, while MDWiki is building a number of collaborations around this. Wikipedia doesn't permit discussion of the cost of medications in LMIC, we provide this for all essential medicines. Etc. Best

WikiDocJames commented 2 years ago

We are also working on a collaboration with Our World in Data...

Jaifroid commented 2 years ago

@WikiDocJames Thank you for the clarifications! I'm happy to go along with the Kiwix decision to use this new ZIM. I'll do so in the next WikiMed UWP and Electron versions later this month. It'll be interesting to see if there is any feedback from users. WikiMed is an excellent project, and the most popular of the apps I provide for Windows (and more recently for Linux as an Electron app). Latest release is here: http://kiwix.github.io/kiwix-js-windows/wikimed-electron.html (permalink).

WikiDocJames commented 2 years ago

Thanks. And please pass along the feedback you get. We are in a much better position to address requests than we were before.

Jaifroid commented 2 years ago

@kelson42 I've been testing out mdwiki_en_all-app_maxi_2022-02.zim as a candidate for release with the WikiMed UWP/Electron app, and there seem to be a number of teething problems with this ZIM. Most notably, many articles are hyperlinked that are not actually in the ZIM. As an example, go to the page for Chckenpox and click on the link for "polymerase chain reaction" in the second paragraph, and no article can be found. Of course searching for "polymerase chain reaction" also gives no result. I have come across several such examples.

I have also found a number of missing images, and some styling problems.

I could release this version today, but I wonder if I should do one more release with the WikiMed archive, and then test hte new mdwiki for March. What do you think? Alternatively I could skip this month as it is nearly finished.

WikiDocJames commented 2 years ago

Not sure if we have progress on this ... But yes we need to fix the things you mentioned first

On Mon, Feb 28, 2022, 09:43 Jaifroid @.***> wrote:

@kelson42 https://github.com/kelson42 I've been testing out mdwiki_en_all-app_maxi_2022-02.zim as a candidate for release with the WikiMed UWP/Electron app, and there seem to be a number of teething problems with this ZIM. Most notably, many articles are hyperlinked that are not actually in the ZIM. As an example, go to the page for Chckenpox and click on the link for "polymerase chain reaction" in the second paragraph, and no article can be found. Of course searching for "polymerase chain reaction" also gives no result. I have come across several such examples.

I have also found a number of missing images, and some styling problems.

I could release this version today, but I wonder if I should do one more release with the WikiMed archive, and then test hte new mdwiki for March. What do you think? Alternatively I could skip this month as it is nearly finished.

— Reply to this email directly, view it on GitHub https://github.com/kiwix/kiwix-js-windows/issues/232#issuecomment-1054449232, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFAWZS3V5THE7MV26HE6VELU5OQ45ANCNFSM5NVLBXSQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

kelson42 commented 2 years ago

@Jaifroid Hmmm, I wonder how this could happen... assuming the backend works properly.

Jaifroid commented 2 years ago

I've tested on both Kiwix JS Windows and on Kiwix JS, and the same style issues are showing up. See example below (from Kiwix JS). Plus a number of hyperlinked articles that are not working in both implementation. Of course both Kiwix JS flavours rely on essentially the same backend, so it would point to an issue with subtle changes, e.g. to stylesheets, that have been applied to the Mediawiki instance that mdwikii is running on.

PS I can override the stylesheet to correct the error shown below, but it would be better to understand where it is coming from at source. I suspect a change to some local stylesheets that the inserted_style_mobile.css doesn't account for.

image

WikiDocJames commented 2 years ago

Working through these. polymerase chain reaction:

https://mdwiki.org/wiki/Polymerase_chain_reaction mirrors https://en.wikipedia.org/wiki/Polymerase_chain_reaction

In order to get included in the zim a page must be in mdwiki.org (not mirrored) or it must be in http://download.openzim.org/wp1/enwiki/customs/medicine.tsv. This page is not in mdwiki.org, where it is mirrored.

If I look for it in medicine.tsv (in my case downloaded as enwp.tsv) @.***:/srv2/mdwiki-cacher/data# grep Polymerase enwp.tsv Polymerase_Chain_Reaction Polymerase_chain Polymerase_chain_reaction_test Polymerase_chain_reacton Polymerase_proofreading-associated_polyposis

It isn't actually there either. It is possible that we should not require an exact match. The mediawiki naming convention is first word upper and remainder lower case, but it is not followed strictly.

so when wmoffliner requests http://offline.mdwiki.org/w/api.php?action=visualeditor&mobileformat=html&format=json&paction=parse&page=Polymerase_chain_reaction

it gets 404 unknown page

I have also found a number of missing images, and some styling problems.

Can you say what they are.

On Mon, Feb 28, 2022 at 12:33 PM James Heilman @.***> wrote:

Not sure if we have progress on this ... But yes we need to fix the things you mentioned first

On Mon, Feb 28, 2022, 09:43 Jaifroid @.***> wrote:

@kelson42 https://github.com/kelson42 I've been testing out mdwiki_en_all-app_maxi_2022-02.zim as a candidate for release with the WikiMed UWP/Electron app, and there seem to be a number of teething problems with this ZIM. Most notably, many articles are hyperlinked that are not actually in the ZIM. As an example, go to the page for Chckenpox and click on the link for "polymerase chain reaction" in the second paragraph, and no article can be found. Of course searching for "polymerase chain reaction" also gives no result. I have come across several such examples.

I have also found a number of missing images, and some styling problems.

I could release this version today, but I wonder if I should do one more release with the WikiMed archive, and then test hte new mdwiki for March. What do you think? Alternatively I could skip this month as it is nearly finished.

— Reply to this email directly, view it on GitHub https://github.com/kiwix/kiwix-js-windows/issues/232#issuecomment-1054449232, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFAWZS3V5THE7MV26HE6VELU5OQ45ANCNFSM5NVLBXSQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

Jaifroid commented 2 years ago

Thank you for that investigation! We should probably open some dedicated issues on mwOffliner. But we can continue gathering them here for now. Regarding images, I wish I'd systematically noted missing images, I'll try to do so from now on when testing. Below is one example from the article SARS-CoV-2 Omicron variant. On the left is the mdwiki version, on the right is the latest Wikipedia medicine ZIM (both February 2022).

image

Jaifroid commented 2 years ago

IMHO the issue with the styling of infoboxes (above) is more glaring than the odd missing image, which we often get in Wikipedia ZIMs. In that case, as I mentioned, I believe there is some hard-coded styling that mwOffliner puts on <div class="thumbinner"> elements. It can be overridden in the stylesheet inserted-style-mobile.css which is a custom stylesheet mwOffliner injects to adapt ZIM articles so that they approximate the appearance of online Wikipedia mobile style.

Jaifroid commented 2 years ago

Several other style differences between the SARS-CoV-2 Omicron variant article on both ZIMs. Infobox missing in mdwiki version, placement and style of title Mutations. Different text in both versions despite same month (though not so surprising, as scraped on different days of month).

image

image

WikiDocJames commented 2 years ago

Here is the MDWiki version which is similar to the Wikipedia version.

Likely we just need to rebuild the cache to get the updates. This of course is an incredibly rapidly changing topic.

James

On Mon, Feb 28, 2022 at 1:39 PM Jaifroid @.***> wrote:

Several other style differences between the SARS-CoV-2 Omicron variant article on both ZIMs. Infobox missing in mdwiki version, placement and style of title Mutations. Different text in both versions despite same month (though not so surprising, as scraped on different days of month).

[image: image] https://user-images.githubusercontent.com/4304337/156055157-98de6452-31c3-41d5-a6b9-5f3dbc776a4b.png

[image: image] https://user-images.githubusercontent.com/4304337/156054784-bd3d6e1a-ba1d-47e8-9029-040fe436f8ad.png

— Reply to this email directly, view it on GitHub https://github.com/kiwix/kiwix-js-windows/issues/232#issuecomment-1054642704, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFAWZS5N7OCZIKGXRJCDNCLU5PMQ7ANCNFSM5NVLBXSQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

-- James Heilman MD, CCFP-EM, Wikipedian

Jaifroid commented 2 years ago

I agree we shouldn't pay attention to textual differences on such fast-moving topics. I couldn't see a screenshot attached to your message, but no matter: the most common style differences I've noticed across mdwiki are the error above where the thumbinner div is too narrow causing text to fill out the small space left of an image, and the lack of some infoboxes which are actually more important than the (too wide) navboxes that are visible by default.

Jaifroid commented 2 years ago

Just to say that the new WikiMed Windows/Linux release includes all necessary adaptations to run the mdwiki archive smoothly. If a user loads an mdwiki archive it will be recognized as a WikiMed-family archive, and the app will show the appropriate icons, etc. Users can also find and download mdwiki archives from within the app. However, for now it comes packaged with wikipedia_en_medicine_maxi_2022-02.zim as an interim measure while I await the new mdwiki version for testing. 😀

tim-moody commented 2 years ago

re: Portal_vein formatting here are html extracts from the two zims:

from wikipedia_en_medicine_maxi_2022-02/A/Portal_vein which formats properly

<td colspan="2" class="infobox-image">
    <span class="mw-default-size">
        <img src="../I/Gray591.png.webp" decoding="async" data-file-width="491" data-file-height="750"
            data-file-type="bitmap" height="382" width="250" loading="lazy">
    </span>
    <div class="infobox-caption">The <b>portal vein</b> and its tributaries. It is formed by the <a
            href="Superior_mesenteric_vein" title="Superior mesenteric vein">superior mesenteric vein</a>,
        inferior mesenteric vein, and <a href="Splenic_vein" title="Splenic vein">splenic vein</a>.
        <i>Lienal vein</i> is an old term for <i>splenic vein</i>.
    </div>
</td>

from mdwiki_en_all_2022-02/A/Portal_vein which doesn't

<td colspan="2" class="infobox-image">
    <div class="thumb tright">
        <div class="thumbinner" style="width:252px"><img src="../I/Gray591.png.webp" decoding="async"
                data-file-width="491" data-file-height="750" data-file-type="bitmap" height="382"
                width="250" loading="lazy">
            <div class="thumbcaption" style="text-align: left"></div>
        </div>
    </div>
    <div class="infobox-caption">The <b>portal vein</b> and its tributaries. It is formed by the <a
            href="Superior_mesenteric_vein" title="Superior mesenteric vein">superior mesenteric vein</a>,
        inferior mesenteric vein, and <a href="Splenic_vein" title="Splenic vein">splenic vein</a>.
        <i>Lienal vein</i> is an old term for <i>splenic vein</i>.
    </div>
</td>
tim-moody commented 2 years ago

I can think of two reasons for the difference. Maybe others have more ideas.

  1. I believe mwoffliner surveys the apis supported by the server at the beginning of the run. mdwiki does not support the rest api so it uses /w/api?action=visualeditor. With EN WP I think other apis are available and could return different results.

  2. There could have been an edit and timing that accounts for the difference; there was only one on Feb 8, but I don't see that it would affect this area of the page.

Note that the cacher answers queries about pages that are not on mdwiki directly from EN WP, though the api will already have been selected.

tim-moody commented 2 years ago

In the chrome browser if I edit this page on the mdwiki zim and replace the existing table html with the wikipedia_en_medicine_maxi_2022-02/A/Portal_vein html the formatting looks OK.

tim-moody commented 2 years ago

Well, now I'm confused. Both of these queries return the same html, with <span class="mw-default-size">

https://en.wikipedia.org/w/api.php?action=visualeditor&mobileformat=html&format=json&paction=parse&page=Portal_vein https://mdwiki.org/w/api.php?action=visualeditor&mobileformat=html&format=json&paction=parse&page=Portal_vein

requests.get ('https://en.wikipedia.org/w/api.php?action=visualeditor&mobileformat=html&format=json&paction=parse&page=Portal_vein').text[4000:8000]

has the same. So where did this html come from?

WikiDocJames commented 2 years ago

Is it the html that formats properly?

Or the html that does not format properly?

J

On Sat, Mar 5, 2022 at 9:25 AM Tim Moody @.***> wrote:

Well, now I'm confused. Both of these queries return the same html, with

https://en.wikipedia.org/w/api.php?action=visualeditor&mobileformat=html&format=json&paction=parse&page=Portal_vein

https://mdwiki.org/w/api.php?action=visualeditor&mobileformat=html&format=json&paction=parse&page=Portal_vein

requests.get (' https://en.wikipedia.org/w/api.php?action=visualeditor&mobileformat=html&format=json&paction=parse&page=Portal_vein').text[4000:8000]

has the same. So where did this html come from?

— Reply to this email directly, view it on GitHub https://github.com/kiwix/kiwix-js-windows/issues/232#issuecomment-1059791633, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFAWZS7N6D4ECSQLXJKSR3DU6ODIVANCNFSM5NVLBXSQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

-- James Heilman MD, CCFP-EM, Wikipedian

tim-moody commented 2 years ago

the one that formats properly

tim-moody commented 2 years ago

I removed caching for enwp and still get the faulty html. Seems like something tells enwp to use a different parser.

The log shows the same queries unless I am missing something:

[info] [2022-03-05T15:20:52.069Z] Getting article [Portal_vein] from http://offline.mdwiki.org/w/api.php?action=visualeditor&mobileformat=html&format=json&paction=parse&page=Portal_vein
[info] [2022-03-05T15:20:52.070Z] Getting JSON from [http://offline.mdwiki.org/w/api.php?action=visualeditor&mobileformat=html&format=json&paction=parse&page=Portal_vein]
[info] [2022-03-05T15:20:52.502Z] Getting JSON from [http://offline.mdwiki.org/w/api.php?action=parse&format=json&prop=modules%7Cjsconfigvars%7Cheadhtml&page=Portal%20vein]
tim-moody commented 2 years ago

both of these give the correct result, so it is only the api

http://offline.mdwiki.org/w/index.php?search=Portal_vein&title=Special%3ASearch&go=Go http://offline.mdwiki.org/wiki/Portal_vein

tim-moody commented 2 years ago

I modified the cacher so that if 'Portal' is in any response from enwp it writes the entire response to the log right before it is returned to the caller. If I then grep the log for thumbinner it is not present.

To be sure I wrote all responses to the log. thumbinner is still not present. What is present is the correct html

<td colspan=\"2\" class=\"infobox-image\"><span class=\"mw-default-size\" typeof=\"mw:Image/Frameless\"><a href=\"./File:Gray591.png\"><img resource=\"./File:Gray591.png\" src=\"//upload.wikimedia.org/wikipedia/commons/thumb/3/33/Gray591.png/250px-Gray591.png\" decoding=\"async\" data-file-width=\"491\" data-file-height=\"750\" data-file-type=\"bitmap\" height=\"382\" width=\"250\" srcset=\"//upload.wikimedia.org/wikipedia/commons/thumb/3/33/Gray591.png/375px-Gray591.png 1.5x, //upload.wikimedia.org/wikipedia/commons/3/33/Gray591.png 2x\"/></a></span><div class=\"infobox-caption\">The <b>portal vein</b> and its tributaries. ...

So I conclude that the cacher is not returning the faulty html.

WikiDocJames commented 2 years ago

Okay so have we fixed it or dose the API need work?

On Sat, Mar 5, 2022, 12:34 Tim Moody @.***> wrote:

I modified the cacher so that if 'Portal' is in any response from enwp it writes the entire response to the log right before it is returned to the caller. If I then grep the log for thumbinner it is not present.

To be sure I wrote all responses to the log. thumbinner is still not present. What is present is the correct html

<td colspan=\"2\" class=\"infobox-image\"><span class=\"mw-default-size\" typeof=\"mw:Image/Frameless\"><a href=\"./File:Gray591.png\"><img resource=\"./File:Gray591.png\" src=\"// upload.wikimedia.org/wikipedia/commons/thumb/3/33/Gray591.png/250px-Gray591.png\ http://upload.wikimedia.org/wikipedia/commons/thumb/3/33/Gray591.png/250px-Gray591.png%5C" decoding=\"async\" data-file-width=\"491\" data-file-height=\"750\" data-file-type=\"bitmap\" height=\"382\" width=\"250\" srcset=\"// upload.wikimedia.org/wikipedia/commons/thumb/3/33/Gray591.png/375px-Gray591.png 1.5x, //upload.wikimedia.org/wikipedia/commons/3/33/Gray591.png 2x\"/><div class=\"infobox-caption\">The portal vein and its tributaries. ...

So I conclude that the cacher is not returning the faulty html.

— Reply to this email directly, view it on GitHub https://github.com/kiwix/kiwix-js-windows/issues/232#issuecomment-1059820363, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFAWZS5KAXR6PG7QH54KIJTU6OZNFANCNFSM5NVLBXSQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

tim-moody commented 2 years ago

no. nothing has changed, except I now believe the cacher is returning the correct response.

@Jaifroid do you have any suggestions?

tim-moody commented 2 years ago

I notice that mwoffliner/src/util/saveArticles.ts has a function that creates similar code (lines 644ff). Can someone say under what circumstances this runs?

Jaifroid commented 2 years ago

@tim-moody Unfortunately I don't know the workings of mwoffliner, perhaps @kelson42 would know more about the details.

What I can say from my investigations (in order to produce a patch in my app), really just corroborates your findings. The error comes from inserted html with thumbinner class that is not present in the Wikipedia ZIMs (medicine) but is present in the mdwiki ZIM. Specifically, the styling problem is caused by this piece of hard-coded style code that appears in the mdwiki archive on pages with certain kinds of infoboxes:

<div class="thumbinner" style="width:252px">

The rogue here is style="width:252px". This shouldn't be hard-coded. It seems rather specific. Maybe it would be easy to find in the mwoffliner code if it is inserted there.

All I know, from observation, is that mwoffliner includes extra stylesheets and some alteration of HTML on scraped pages, in an attempt to approximate the Wikipedia Mobile style. AFAIU, mwoffliner queries Parsoid to get a style-free representation of the page (?) and then adds stylesheets. (This understanding may be wrong.)

Is it possible that mwoffliner is not using parsoid for mdwiki, and is falling back to some older code that transformed the desktop style to mobile style?

Or, it is not recognizing the page it is getting as a mobile style due to the change of a string from wikipedia to mdwiki somewhere, so adds some transformations to it?

Jaifroid commented 2 years ago

Just quickly to say that the first "WikiMed by Kiwix" app using the MDwiki ZIM is now available:

https://github.com/kiwix/kiwix-js-windows/releases/tag/v1.9.8-WikiMed Permalink: http://kiwix.github.io/kiwix-js-windows/wikimed-electron.html

The Windows Store version is pending certification. The Electron versions for Windows and Linux can be downloaded directly from the above GitHub releases, or on Windows it can be installed via winget package manager, with winget install kiwix.wikimed.electron.

Once all packages are certified, I'll close this issue.

tim-moody commented 2 years ago

great news!

WikiDocJames commented 2 years ago

Have it installed. It is using the old rather than the new intro pages. Is that on purpose?

New Intro page https://mdwiki.org/wiki/App/IntroPage

James

On Sun, Apr 24, 2022 at 5:19 PM Tim Moody @.***> wrote:

great news!

— Reply to this email directly, view it on GitHub https://github.com/kiwix/kiwix-js-windows/issues/232#issuecomment-1107937647, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFAWZS3UXCFCFCGO7GVYGGTVGXJJXANCNFSM5NVLBXSQ . You are receiving this because you were mentioned.Message ID: @.***>

-- James Heilman MD, CCFP-EM, Wikipedian

Jaifroid commented 2 years ago

@WikiDocJames It's actually using a custom version of the intro pages which I created (based on the old one, indeed) during the COVID pandemic, hence the pride of place given to an info box of links to COVID-19-related articles.

I'd be happy to remove that override page in the future if you don't like it! It's been running since shortly after pandemic lockdowns started, so maybe it's had its day, as we head out (?) of pandemic.

WikiDocJames commented 2 years ago

We could add a section for COVID19 here... Will look at doing so over the next few days.

https://mdwiki.org/wiki/App/IntroPage

James

On Sun, Apr 24, 2022 at 5:31 PM Jaifroid @.***> wrote:

@WikiDocJames https://github.com/WikiDocJames It's actually using a custom version of the intro pages which I created (based on the old one, indeed) during the COVID pandemic, hence the pride of place given to an info box of links to COVID-19-related articles.

I'd be happy to remove that override page in the future if you don't like it! It's been running since shortly after pandemic lockdowns started, so maybe it's had its day, as we head out (?) of pandemic.

— Reply to this email directly, view it on GitHub https://github.com/kiwix/kiwix-js-windows/issues/232#issuecomment-1107939371, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFAWZS4BL7BPUFBC4QOF6C3VGXKWXANCNFSM5NVLBXSQ . You are receiving this because you were mentioned.Message ID: @.***>

-- James Heilman MD, CCFP-EM, Wikipedian

Jaifroid commented 2 years ago

OK, sounds good!

Jaifroid commented 2 years ago

All packages now certified and published. Many thanks to all for your help and discussion. I'll keep an eye out for what to do with the landing page for the next release.

WikiDocJames commented 2 years ago

Hey Jaifroid

COVID is dealt with under Infectious diseases... On further though I think it is reasonable to leave it there.

James

On Mon, Apr 25, 2022 at 1:14 AM Jaifroid @.***> wrote:

All packages now certified and published. Many thanks to all for your help and discussion. I'll keep an eye out for what to do with the landing page for the next release.

— Reply to this email directly, view it on GitHub https://github.com/kiwix/kiwix-js-windows/issues/232#issuecomment-1108163081, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFAWZS527DA4ELUI7PS4IYLVGZA5NANCNFSM5NVLBXSQ . You are receiving this because you were mentioned.Message ID: @.***>

-- James Heilman MD, CCFP-EM, Wikipedian

WikiDocJames commented 2 years ago

Additionally I notice the text rapiding issue is not present on the Windows App.

But is present on the Google chrome extension still. Wondering if the fix in the windows app could be implemented in the chrome extension?

James

On Mon, Apr 25, 2022 at 12:06 PM James Heilman @.***> wrote:

Hey Jaifroid

COVID is dealt with under Infectious diseases... On further though I think it is reasonable to leave it there.

James

On Mon, Apr 25, 2022 at 1:14 AM Jaifroid @.***> wrote:

All packages now certified and published. Many thanks to all for your help and discussion. I'll keep an eye out for what to do with the landing page for the next release.

— Reply to this email directly, view it on GitHub https://github.com/kiwix/kiwix-js-windows/issues/232#issuecomment-1108163081, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFAWZS527DA4ELUI7PS4IYLVGZA5NANCNFSM5NVLBXSQ . You are receiving this because you were mentioned.Message ID: @.***>

-- James Heilman MD, CCFP-EM, Wikipedian

-- James Heilman MD, CCFP-EM, Wikipedian

Jaifroid commented 2 years ago

@WikiDocJames OK, thanks. I find the intro pages (generally, not just MDwiki ones) to be quite bland, though they do the job and I guess most people won't spend long on a landing page. I think they're designed for mobile, whereas this app is more used on laptops/PCs. So I designed an override page during the pandemic, that I expanded a bit each month to provide an evolving index of COVID-19 articles, since everyone was following COVID news pretty intensely. I'll probably make it optional going forwards.

Jaifroid commented 2 years ago

Additionally I notice the text rapiding issue is not present on the Windows App. But is present on the Google chrome extension still. Wondering if the fix in the windows app could be implemented in the chrome extension?

I provided custom support for this in the KJSW app, precisely so as not to hold up release of the app. The browser extension version aims to be a pure reader, and I believe the lead developer would prefer this issue to be fixed at source, so all the readers can benefit from it. I would tend to agree -- I think @tim-moody is working on a fix at source. If it looks like it might take a while to appear in ZIMs, then I can open an issue to backport my fix into the KJS readers on an exceptional basis, but I think it might be a hard sell!

WikiDocJames commented 2 years ago

Tim and I discussed the issue. Appears to be related to the API that is used by the software. A fix looks like it would costs a couple of thousand per year so will likely leave for now. Thanks for fixing it for the windows app.

James

On Mon, Apr 25, 2022 at 3:18 PM Jaifroid @.***> wrote:

Additionally I notice the text rapiding issue is not present on the Windows App. But is present on the Google chrome extension still. Wondering if the fix in the windows app could be implemented in the chrome extension?

I provided custom support for this in the KJSW app, precisely so as not to hold up release of the app. The browser extension version aims to be a pure reader, and I believe the lead developer would prefer this issue to be fixed at source, so all the readers can benefit from it. I would tend to agree -- I think @tim-moody https://github.com/tim-moody is working on a fix at source. If it looks like it might take a while to appear in ZIMs, then I can open an issue to backport my fix into the KJS readers on an exceptional basis, but I think it might be a hard sell!

— Reply to this email directly, view it on GitHub https://github.com/kiwix/kiwix-js-windows/issues/232#issuecomment-1109049897, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFAWZS6BDVL2ICSETO5TMB3VG4DYVANCNFSM5NVLBXSQ . You are receiving this because you were mentioned.Message ID: @.***>

-- James Heilman MD, CCFP-EM, Wikipedian

Jaifroid commented 2 years ago

OK, but I can't see any reason why the fix couldn't be included in the scraper software: i.e. in mwOffliner. This wouldn't cost anything (other than developer time!), and then Kiwix clients would not have to implement their own solutions separately.

WikiDocJames commented 2 years ago

Yes agree that would be ideal

On Mon, Apr 25, 2022, 23:16 Jaifroid @.***> wrote:

OK, but I can't see any reason why the fix couldn't be included in the scraper software: i.e. in mwOffliner. This wouldn't cost anything (other than developer time!), and then Kiwix clients would not have to implement their own solutions separately.

— Reply to this email directly, view it on GitHub https://github.com/kiwix/kiwix-js-windows/issues/232#issuecomment-1109353785, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFAWZS3LEDWRCTNLLZUPYKDVG533NANCNFSM5NVLBXSQ . You are receiving this because you were mentioned.Message ID: @.***>

Jaifroid commented 2 years ago

@WikiDocJames On a recent reddit post advertising this app with the new MDwiki ZIM, a reader asks:

Why is the article count so much lower than the previous (Feb) release?

I'm not sure where they got this idea from. I answered:

I've just checked in the console log of the PWA app (with verbose logging on). For last month's WikiMed (based on the Wikipedia ZIM) the title count is 305,007, while for the mdwiki ZIM from this month, the title count is 300,578. This is number of "titles" in the title index, but be aware that a "title" includes variants of article names, like "Human African Trypanosomiasis" and "Human African trypanosomiasis" (last word spelt with initial lower-case t), both of which lead to the same article, hence these numbers are much larger than the 57,000-odd articles the project contains. However, I think these (calculated) title entries show that the MDwiki ZIM has no less information than the Wikipedia ZIM it is based on.

I thought you should just be aware that somewhere it is appearing that the MDwiki has fewer articles than the ZIM based on Wikipedia, but this seems self-evidently wrong. If you have Reddit, feel free to participate in that thread (if you have anything to add), or let me know if I should make any further comment.

tim-moody commented 2 years ago

There are a number of things at play. article count has included redirects for awhile and is in the process of being fixed. Also for some reason wikimed includes articles not in the article list medicine.tsv, so its count will be higher, but the extra articles are not medical.