openzim / mwoffliner

Mediawiki scraper: all your wiki articles in one highly compressed ZIM file
https://www.npmjs.com/package/mwoffliner
GNU General Public License v3.0
292 stars 74 forks source link

Produce a scrape of a smallish Wiktionary with dev-1.14 for testing #2098

Open Jaifroid opened 2 weeks ago

Jaifroid commented 2 weeks ago

In general, it would be a good idea to do some road-testing of Wikimedia ZIMs other than Wikipedia with the new API (assuming other types also use this), but also there are a few issues where longstanding problems may have been fixed by dev (or others potentially introduced). For example, #2073 and the very similar #1033.

A smallish one with full features might be wiktionary_es_all_max (latest version we have is https://library.kiwix.org/viewer#wiktionary_es_all_maxi_2024-06, was produced by 1.13). This is just 890MB, so seems good for testing.

audiodude commented 2 weeks ago

Do we need to do this through zimfarm, or is it okay if I just do the scrape on my local machine and send you the ZIM?

Jaifroid commented 2 weeks ago

It depends whether it needs wider testing on other clients, I suppose. I only really test thoroughly on KJS Browser Extension and on Kiwix PWA. If I find an issue, I try to corroborate that the issue is also on other clients (Kiwix Desktop and Kiwix Android), or whether it's a problem with the JS client(s) only, so I do test a bit more widely, and then report if I find something significant that needs fixing at scraper level or in other clients.

In this case I'll be particularly keen to see whether #2073 is solved or not, and will report back on that.

So, whatever is easiest for you!

audiodude commented 2 weeks ago

So I'm unable to produce a es.wiktionary.org ZIM right now because of #2003.

I'm getting:

[error] [2024-11-05T19:14:56.416Z] Cannot render [] into an article
[error] [2024-11-05T19:14:56.417Z] Error downloading article awalk
[error] [2024-11-05T19:14:56.417Z] Failed to run mwoffliner after [2611s]: {
    "name": "Error",
    "message": "Cannot render [] into an article"
}
[error] [2024-11-05T19:14:56.417Z] 

**********

Cannot render [] into an article

**********
Jaifroid commented 2 weeks ago

Thanks for trying @audiodude!