Open TripleCamera opened 12 months ago
Hi. May I ask how long it usually takes to fulfill a request? Some of our readers suffer from poor Internet connection, and the offline version might be the only solution.
Hi, the recipe is created https://farm.openzim.org/recipes/minecraftwiki_zh_all I'll update the library link here once ready
@RavanJAltaie Thank you! That's fast as lightning!
Unfortunately, if nothing went wrong, something would go wrong. The latest log said that there was a 404 when accessing https://zh.minecraft.wiki/w/api.php?action=query&meta=siteinfo&format=json&siprop=general|namespaces|statistics|variables|category|wikidesc.
This is because the API path is /api.php
, not the default /w/api.php
. For more information, please check out Special:Version.
the language should be zh
instead of nan
@xtexChooser @TripleCamera Thanks for your notes, all fixed in the recipe, I re-run it & will follow up.
@RavanJAltaie The language is correct now. However, the value of mwApiPath
is not correct. Please change it to /api.php
, thanks.
@RavanJAltaie Good news: I just set up the docker environment used by openZIM scrapers. I am importing the config used by the scraper. Then I will try to fix the errors on my machine. I will posts a list of corrected arguments once I finish.
Update: Here is the script:
#!/bin/bash
# Usage: sudo ./run.sh
# For docker:
# Added: --rm
# Modified: -v
# Removed: --detach, --cpu-shares, --memory-swappiness, --memory
# For mwoffliner:
# Modified: --adminEmail, --customZimDescription
# Removed: --optimisationCacheUrl, --osTmpDir
docker run \
-v /home/co-eda/mwoffliner-docker/output:/output:rw \
--name mwoffliner_minecraftwiki_zh_all \
--rm \
ghcr.io/openzim/mwoffliner:1.13.0 \
mwoffliner \
--adminEmail="TripleCamera@outlook.com" \
--customZimDescription="Docker test" \
--customZimFavicon="https://zh.minecraft.wiki/images/Wiki2x.png" \
--customZimLanguage="zho" \
--customZimTitle="Minecraft Wiki (zh)" \
--format="novid:maxi" \
--mwApiPath="/api.php" \
--mwUrl="https://zh.minecraft.wiki/" \
--outputDirectory="/output" \
--publisher="openZIM" \
--webp
@RavanJAltaie TL;DR Please set --customZimFavicon
to https://zh.minecraft.wiki/images/Wiki%402x.png
, thanks.
I saw that the value of --mwApiPath
had been changed to /api.php
. However, at the same time, the %40
character in --customZimFavicon
had been removed by someone. Please add it back.
The next issue I encountered after fixing this was:
Unable to find appropriate API end-point to retrieve article HTML
I am still investigating about this.
I found the cause of Unable to find appropriate API end-point to retrieve article HTML
. Here is a code analysis of MWoffliner v1.13.0 (since all the scrapers are using it).
Before the scrape starts, MWoffliner checks mobile REST API, desktop REST API, and VE REST API capabilities for a specific page (parameter testArticleId
) in Downloader.checkCapabilities
:
public async checkCapabilities(testArticleId = 'MediaWiki:Sidebar'): Promise<void> {
// By default check all API's responses and set the capabilities
// accordingly. We need to set a default page (always there because
// installed per default) to request the REST API, otherwise it would
// fail the check.
this.mwCapabilities.mobileRestApiAvailable = await this.checkApiAvailabilty(this.mw.getMobileRestApiArticleUrl(testArticleId))
this.mwCapabilities.desktopRestApiAvailable = await this.checkApiAvailabilty(this.mw.getDesktopRestApiArticleUrl(testArticleId))
this.mwCapabilities.veApiAvailable = await this.checkApiAvailabilty(this.mw.getVeApiArticleUrl(testArticleId))
this.mwCapabilities.apiAvailable = await this.checkApiAvailabilty(this.mw.apiUrl.href)
// Coordinate fetching
// [...]
}
The default value MediaWiki:Sidebar
is never used because the value of mwMetaData.mainPage
is passed:
await downloader.checkCapabilities(mwMetaData.mainPage)
The value of mwMetaData.mainPage
comes from API. The base URL is stripped and its last part is taken. (This is a bad idea because different wikis have different URL rewrites.)
public async getMwMetaData(downloader: Downloader): Promise<MWMetaData> {
if (this.metaData) {
return this.metaData
}
const creator = this.getCreatorName() || 'Kiwix'
const [textDir, { langIso2, langIso3, mainPage, siteName }, subTitle] = await Promise.all([
this.getTextDirection(downloader),
this.getSiteInfo(downloader),
this.getSubTitle(downloader),
])
const mwMetaData: MWMetaData = {
// [...]
mainPage,
}
this.metaData = mwMetaData
return mwMetaData
}
public async getSiteInfo(downloader: Downloader) {
logger.log('Getting site info...')
const query = 'action=query&meta=siteinfo&format=json&siprop=general|namespaces|statistics|variables|category|wikidesc'
const body = await downloader.query(query)
const entries = body.query.general
// Checking mediawiki version
const mwVersion = semver.coerce(entries.generator).raw
const mwMinimalVersion = 1.27
if (!entries.generator || !semver.satisfies(mwVersion, `>=${mwMinimalVersion}`)) {
throw new Error(`Mediawiki version ${mwVersion} not supported should be >=${mwMinimalVersion}`)
}
// Base will contain the default encoded article id for the wiki.
const mainPage = decodeURIComponent(entries.base.split('/').pop())
const siteName = entries.sitename
// [...]
return {
mainPage,
siteName,
langIso2,
langIso3,
}
}
This works for many wikis like English Wikipedia, but not for Chinese Minecraft Wiki. The reason is that MCW-zh has URL rewrite:
// Wikipedia-en
"base": "https://en.wikipedia.org/wiki/Main_Page",
// MCW-zh
"base": "https://zh.minecraft.wiki/",
Currently I don't know how to fix this. Do you have any ideas?
Currently I don't know how to fix this. Do you have any ideas?
I think you should open a ticket at mwoffliner referencing your comment.
I have fixed the recipe - which was wrongly configured - earlier today. We have to document how to configure mwoffliner properly! But no (visual editor) API is available. I have tried with version 1.14 (still in dev), which have more API end-point support, but I'm not over with this.
I think you should open a ticket at mwoffliner referencing your comment.
Okay, I just opened openzim/mwoffliner#1995.
Both the code and the config between v1.13.0 and git main differs a lot. So I need to alter config and test this on git main.
I don't know if this issue can be fixed without modifying code. The worst case would be switching to git main. :frowning_face:
I have fixed the recipe - which was wrongly configured - earlier today. We have to document how to configure mwoffliner properly! But no (visual editor) API is available. I have tried with version 1.14 (still in dev), which have more API end-point support, but I'm not over with this.
Thank you! However, the config between v1.13.0 and git main differs, so you need to rewrite config to make it work.
In v1.13.0 (I will test git main later), MWoffliner accepts three different APIs:
Desktop REST API: Available in both Wikimedia REST API and MediaWiki REST API. However, MediaWiki REST API cannot be used without modifying the code.
/page/html/{title}
./page/{title}/html
.In MWoffliner, it is hardcoded so that the page title can only come last. I try to modify the code, and it seems to succeed (it fails later :frowning_face:, but it seems promising).
Update: @xtexChooser inspired me to try Parsoid API, whose URL is /rest.php/{domain}/v3/page/html/{title}
. So I set --mwRestApiPath="/rest.php/zh.minecraft.wiki/v3/page/html"
. However, this would be redirected to /rest.php/{domain}/v3/page/html/{title}/{latest_revision}
. Since the response code is 302, not 200, it is regarded as inaccessible.
Upstream? All right, I will post my progress in the upstream issue.
I'm back. openzim/mwoffliner#1995 has been fixed, which enables MWoffliner to scrape MCW-zh. However, the recipe still fails due to incorrect arguments.
@RavanJAltaie Hi. Could you please fix the recipe? The steps are:
--mwApiPath
--mwActionApiPath="api.php"
(NO LEADING SLASH)--speed
to an appropriate value (I was using 0.5 and I couldn't sense significant changes on page load time)Can someone remove the "Upstream" label and reassign @RavanJAltaie? Thanks.
Hi. Excuse me, @RavanJAltaie. Is it possible to continue with this issue? It's been stalled for two months.
Tip: This issue has been created almost a year ago, so it's very far from the top of the list. You may use sort:updated-desc
in the search bar to sort by last reply.
@TripleCamera This wiki will require using mediawiki offliner, which is still being revamped. I'm told release is set to be soon, at which point this request is likely to move pretty fast along the queue. Until then, nothing that we can do ¯_(ツ)_/¯
@TripleCamera This wiki will require using mediawiki offliner, which is still being revamped. I'm told release is set to be soon, at which point this request is likely to move pretty fast along the queue. Until then, nothing that we can do ¯(ツ)/¯
OK, that's a sad story. :frowning_face: Thank you for your explanation.
Please use the following format for a ZIM creation request (and delete unnecessary information)