openzim / zim-requests

Want a new ZIM file? Propose ZIM content improvements or fixes? Here you are!
https://farm.openzim.org
38 stars 2 forks source link

bibnum_fr_all is failing #998

Open benoit74 opened 4 months ago

benoit74 commented 4 months ago

Recipe URL

https://farm.openzim.org/recipes/bibnum_fr_all

Last log lines

----------
Testing warc2zim args
Running: warc2zim --favicon=https://drive.farm.openzim.org/Corrected%20Logos%20for%20recipes/bibnum_fr_all.png --name=bibnum_fr_all --publisher=openZIM --verbose --output /output --url https://journals.openedition.org/bibnum/ --title BibNum --description Textes fondateurs de la science analysés par les scientifiques d'aujourd'hui
Writing progress to /output/task_progress.json

----------
Output to tempdir: /output/.tmp92evrwku - will keep
Running browsertrix-crawler crawl: crawl --failOnFailedSeed --waitUntil load --title BibNum --description Textes fondateurs de la science analysés par les scientifiques d'aujourd'hui --depth -1 --timeout 90 --exclude (\?lang=|\?q=) --lang fra --behaviors autoplay,autofetch,siteSpecific --behaviorTimeout 90 --diskUtilization 90 --url https://journals.openedition.org/bibnum/ --userAgent Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.0 Safari/605.1.15 +Zimit contact+zimfarm@kiwix.org --cwd /output/.tmp92evrwku --statsFilename /output/crawl.json
{"timestamp":"2024-04-16T23:15:54.616Z","logLevel":"info","context":"general","message":"Browsertrix-Crawler 0.12.4 (with warcio.js 1.6.2 pywb 2.7.4)","details":{}}
{"timestamp":"2024-04-16T23:15:54.620Z","logLevel":"info","context":"general","message":"Seeds","details":[{"url":"https://journals.openedition.org/bibnum/","include":["/^https?:\\/\\/journals\\.openedition\\.org\\/bibnum\\//"],"exclude":["/(\\?lang=|\\?q=)/"],"scopeType":"prefix","sitemap":false,"allowHash":false,"maxExtraHops":0,"maxDepth":1000000}]}
{"timestamp":"2024-04-16T23:15:55.572Z","logLevel":"info","context":"worker","message":"Creating 1 workers","details":{}}
{"timestamp":"2024-04-16T23:15:55.572Z","logLevel":"info","context":"worker","message":"Worker starting","details":{"workerid":0}}
{"timestamp":"2024-04-16T23:15:55.775Z","logLevel":"info","context":"worker","message":"Starting page","details":{"workerid":0,"page":"https://journals.openedition.org/bibnum/"}}
{"timestamp":"2024-04-16T23:15:55.779Z","logLevel":"info","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":null,"total":null,"pending":1,"failed":0,"limit":{"max":0,"hit":false},"pendingPages":["{\"seedId\":0,\"started\":\"2024-04-16T23:15:55.575Z\",\"url\":\"https://journals.openedition.org/bibnum/\",\"added\":\"2024-04-16T23:15:54.746Z\",\"depth\":0}"]}}
{"timestamp":"2024-04-16T23:15:56.024Z","logLevel":"info","context":"general","message":"Awaiting page load","details":{"page":"https://journals.openedition.org/bibnum/","workerid":0}}
{"timestamp":"2024-04-16T23:15:57.143Z","logLevel":"error","context":"worker","message":"Page Crashed","details":{"type":"exception","message":"Page crashed!","stack":"Error: Page crashed!\n    at #onTargetCrashed (file:///app/node_modules/puppeteer-core/lib/esm/puppeteer/common/Page.js:284:28)\n    at file:///app/node_modules/puppeteer-core/lib/esm/puppeteer/common/Page.js:153:41\n    at file:///app/node_modules/puppeteer-core/lib/esm/third_party/mitt/index.js:1:248\n    at Array.map (<anonymous>)\n    at Object.emit (file:///app/node_modules/puppeteer-core/lib/esm/third_party/mitt/index.js:1:232)\n    at CDPSessionImpl.emit (file:///app/node_modules/puppeteer-core/lib/esm/puppeteer/common/EventEmitter.js:82:22)\n    at CDPSessionImpl._onMessage (file:///app/node_modules/puppeteer-core/lib/esm/puppeteer/common/Connection.js:425:18)\n    at Connection.onMessage (file:///app/node_modules/puppeteer-core/lib/esm/puppeteer/common/Connection.js:255:25)\n    at WebSocket.<anonymous> (file:///app/node_modules/puppeteer-core/lib/esm/puppeteer/common/NodeWebSocketTransport.js:46:32)\n    at callListener (/app/node_modules/puppeteer-core/node_modules/ws/lib/event-target.js:290:14)","page":"https://journals.openedition.org/bibnum/","workerid":0}}
{"timestamp":"2024-04-16T23:15:57.144Z","logLevel":"warn","context":"pageStatus","message":"Page Load Failed","details":{"loadState":0,"page":"https://journals.openedition.org/bibnum/","workerid":0}}
{"timestamp":"2024-04-16T23:15:57.164Z","logLevel":"info","context":"worker","message":"Worker exiting, all tasks complete","details":{"workerid":0}}
{"timestamp":"2024-04-16T23:15:57.203Z","logLevel":"fatal","context":"general","message":"Page Load Timeout, failing crawl. Quitting","details":{"msg":"Navigation failed because browser has disconnected!","page":"https://journals.openedition.org/bibnum/","workerid":0}}

SIGINT/SIGTERM received, stopping zimit

How many times the recipe failed in a row?

Many

How many ZIM have been produced before failure?

Zero

Which action did you undertake so far?

None, I have no idea of what to do

What's next?

This has to be fixed by dev team (upstream scraper / zimfarm problem)

More details

No response

benoit74 commented 4 months ago

I have updated the recipe to push to /.hidden/dev and to use Zimit2. Let's see how it fixes the situation.

kelson42 commented 4 months ago

Works! https://farm.openzim.org/pipeline/82cfb621-6311-4f68-b168-ad75cd53608b

benoit74 commented 3 months ago

Moved back to prod and requested again, should be ok in few hours: https://farm.openzim.org/pipeline/704e8fef-799b-4b1c-b6c5-85bfecc724aa