openzim / mwoffliner

Mediawiki scraper: all your wiki articles in one highly compressed ZIM file
https://www.npmjs.com/package/mwoffliner
GNU General Public License v3.0
285 stars 73 forks source link

wikipedia_arz_all: Unable to retrieve js/css dependencies for article #1730

Open uriesk opened 1 year ago

uriesk commented 1 year ago

wikipedia_arz_all fails because of one article not able to fetch dependencies because of a backend error internal_api_error_Wikimedia\RequestTimeout\RequestTimeoutException, it can be replicated by:

npm start -- --adminEmail=contact@kiwix.org --mwUrl=https://arz.wikipedia.org --articleList "رقيب الشمس السمحى"

the API URL:

https://arz.wikipedia.org/w/api.php?action=parse&format=json&prop=modules%7Cjsconfigvars%7Cheadhtml&page=%D8%B1%D9%82%D9%8A%D8%A8%20%D8%A7%D9%84%D8%B4%D9%85%D8%B3%20%D8%A7%D9%84%D8%B3%D9%85%D8%AD%D9%89
kelson42 commented 1 year ago

@uriesk I wonder if we could replace (for Wikimedia wiki only) the usage of this backend with the usage of /api/rest_v1/page/mobile-html-offline-resources/... at least would be great to reduce it because this end-point generates a lot of load on backend side, see https://phabricator.wikimedia.org/T324866

uriesk commented 1 year ago

Another one at wikipedia_ceb_all

https://farm.openzim.org/pipeline/72331d6329a4e53291be2c36/debug

Unable to retrieve js/css dependencies for article 'Polymixis flavicincta': internal_api_error_DBConnectionError
stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.

VadimKovalenkoSNF commented 12 months ago

I can't reproduce exact same issue with WikimediaMobile or WikimediaDesktop renderers now. Both arz and ceb wikis work. The problem I faced is that the Kiwix main index page is not working for Polymixis_flavicincta in ceb.wikipedia.org and for رقيب الشمس السمحى in arz.wikipedia.org. The problem is in line we we getting articleDetail

const articleDetail = await articleDetailXId.get(articleId)

The articles above have null value there.

kelson42 commented 11 months ago

I have reconfigured https://farm.openzim.org/recipes/wikipedia_arz_all/ to use the dev version of MWoffliner to see it works better.

VadimKovalenkoSNF commented 11 months ago

After updating the main branch and additional testing I can't reproduce the issue that I mentioned above as well.