Closed Jaifroid closed 4 months ago
Was waiting to hear from you as I noticed the failure. I have been working on what I hope is a more robust version of the cacher. I ran it and it succeeded. You can run it by changing these parameters:
--articleList="https://mdwiki.wmcloud.org/nonwiki/lists/mdwikimed.tsv" --mwUrl="https://mdwiki.wmcloud.org/"
I think we now have problem pages on EN WP, which time out. The new cacher returns 404 for them rather than 50x. I was planning to test for one more month, but since the old cacher now fails I think we should switch.
Ha! I tend to check round about now (or earlier) each month in preparation for making a new release...
I don't have access to change parameters or initiate runs on zimfarm. Perhaps you do? If not, maybe @benoit74. Many thanks for being on top of it, @tim-moody.
@Jaifroid and @benoit74, I don't have the ability to change parameters or initiate runs either. I'd appreciate it if someone could make the changes and rerun. Alternatively, I have the zims, but I'd rather not step outside of the usual workflow.
@Jaifroid and @benoit74, I don't have the ability to change parameters or initiate runs either. I'd appreciate it if someone could make the changes and rerun. Alternatively, I have the zims, but I'd rather not step outside of the usual workflow.
Yes, to build the app (which is done on GitHub actions), the ZIMs need to be available in mirror.download.kiwix.org, as they are pulled (more than once) into different workflows. So it's best to wait for a re-run, which hopefully can happen next week.🤞
In both cases, the page "COVID-19_pandemic_in_Switzerland" seems to not be in good shape, at least wikipedia API is returning an HTTP 500 error on http://offline.mdwiki.org/w/api.php?action=visualeditor&mobileformat=html&format=json&paction=parse&page=COVID-19_pandemic_in_Switzerland.
Looks like this is linked to a recent edit of the page, which is unfortunately semi-protected.
I requested an edit: https://en.wikipedia.org/wiki/Talk:COVID-19_pandemic_in_Switzerland#Semi-protected_edit_request_on_13_May_2024
OK, as I have the correct editing status, I was able to make that edit. I suspect it may take some time to carry over to mdwiki?
Your problem will be that COVID-19_pandemic_in_Switzerland was only the first of the various timeline of covid pages that you hit and that killed the run. There are at least 4 and a few other EN WP pages that can also kill it.
https://mdwiki.org/wiki/Timeline_of_the_COVID-19_pandemic_in_South_Africa https://en.wikipedia.org/wiki/COVID-19_pandemic_in_Cuba
My experience is that this started with the wmf shift of data centers last month, though it was worst during the shift. At that time I was testing mdwiki.wmcloud.org and it failed repeatedly as did http://offline.mdwiki.org/.
Both of these are proxies for mdwiki.org, but the new one now proxies the EN WP pages as well and returns 404 for any 50x error page so the the run does not break.
I suspect it may take some time to carry over to mdwiki?
EN WP pages are read directly by the proxy, but they are cached, so there could be a lag for a page that does not fail, but a failed page is not cached, so would be read immediately.
FYI http://offline.mdwiki.org/ is deprecated and I hope to phase it out in July in favor of mdwiki.wmcloud.org. The latter has had two good runs so far, last month and this.
You may also be interested in https://mdwiki.wmcloud.org/nonwiki/status
Hmm, so despite the edit http://offline.mdwiki.org/w/api.php?action=visualeditor&mobileformat=html&format=json&paction=parse&page=COVID-19_pandemic_in_Switzerland is still returning 500. I edited out the commented block in the stated revision (deleted the block), but it seems the issue could be something else.
@tim-moody What is the next step? We could blacklist the not-working articles for now I suppose. But looking ahead, is it a straight swap to use mdwiki.wmcloud.org, or would it require development work? I'm sorry -- I only work on the JS app(s), so don't really have experience with the backend...
Another alternative might be (if this is OK with @benoit74) for us to upload manually the ZIMs from your successful run to download.kiwix.org (I have access to that). Though they might not end up being visible to library.kiwix.org... Not sure about protocol on this, @benoit74.
What is the next step?
I recommend moving to the new cacher, see above for the changes to the recipe.
Thank you very much @tim-moody
I did not understood your first comment correctly, sorry for that. And did not realized either at first who you are, sorry again ^^
FYI http://offline.mdwiki.org/ is deprecated and I hope to phase it out in July in favor of mdwiki.wmcloud.org. The latter has had two good runs so far, last month and this
Then the way forward is quite obvious!
Since @RavanJAltaie is not working on Mondays, I've done the transition of configurations from offline.mdwiki.org
to mdwiki.wmcloud.org
and I've requested the recipes again with high priority, so they start soon (zimfarm pipe is quite significant since few days)
and
--articleList="https://mdwiki.wmcloud.org/nonwiki/lists/mdwikimed.tsv" ?
Thank you very much, both. Fingers crossed!
and --articleList="https://mdwiki.wmcloud.org/nonwiki/lists/mdwikimed.tsv" ?
Yep, I only didn't mentioned we were still using http and not https so task failed once more ... :(
both are now progressing correctly, hopefully it will soon be finished and OK.
The good news is that both files completed (I'm downloading now). The possibly not-so-good news is the mdwiki_en_all_app_maxi_2024-05.zim file (the one intended for use in apps), which should be around 1.5GB, is listed as 2GB. This archive should be smaller than mdwiki_en_all_maxi_2024-05.zim (standard), as it is supposed not to have a full-text index. For some strange reason it appears to be bigger unless the listing size is wrong. EDIT: Listing is wrong, see below.
OK, forget that: the listing on download.kiwix.org is wrong. It says 2GB, and the archive is in fact 1.51GB (after downloading). I've seen this before, so it's a (non-serious) bug with the download library software... All looks good here. I'll close this issue after testing.
@Jaifroid Please open dedicated issue, so far I don't understand what you mean.
@kelson42 I would, but I don't know where to open a bug for master.download.kiwix.org. See circled entry in screenshot for what I mean. After downloading it, the actual file size is 1.51GB (bottom screenshot).
Come to think of it, it may just be rounding? Perhaps it has tipped from 1.49GB to 1.51GB. Probably not a bug.
@Jaifroid you can open an issue on https://github.com/kiwix/k8s if needed
I'm closing this ticket since problem is solved
Great, thanks for your help solving this so quickly @benoit74 and @tim-moody. Regarding the file size reporting, I don't think I need to open a bug report, as it's just aggressive rounding, and ultimately it's not a value that normal users will ever see.
ZIM(s) location
Last good scrapes:
Recipe(s) URL
Details
After five months of good scrapes, both mdwiki recipes have failed simultaneously. @tim-moody?