webrecorder / browsertrix-crawler

Run a high-fidelity browser-based web archiving crawler in a single Docker container
https://crawler.docs.browsertrix.com
GNU Affero General Public License v3.0
611 stars 79 forks source link

Slow down + retry on HTTP 429 errors #392

Open benoit74 opened 11 months ago

benoit74 commented 11 months ago

The crawler should behave more appropriately when it is encountering HTTP 429 - Too Many Requests errors.

Below is an example log where the website requested the scraper to slow-down but the crawler continued to proceed at the same pace.

Sample website where it happens after some times (happening after more or less 1 hour) : https://radiopaedia.org

Logs capture
{"logLevel":"info","timestamp":"2023-09-05T00:14:58.691Z","context":"worker","message":"Starting page","details":{"workerid":5,"page":"https://radiopaedia.org/cases/solitary-fibrous-tumor-of-the-dura-4?lang=us"}}
{"logLevel":"info","timestamp":"2023-09-05T00:14:58.692Z","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":9,"total":410,"pending":6,"failed":1,"limit":{"max":0,"hit":false},"pendingPages":["{\"seedId\":0,\"started\":\"2023-09-05T00:14:56.143Z\",\"url\":\"https://radiopaedia.org/go-ad-free\",\"added\":\"2023-09-05T00:14:35.344Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:14:58.308Z\",\"url\":\"https://radiopaedia.org/about\",\"added\":\"2023-09-05T00:14:35.347Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:14:58.691Z\",\"url\":\"https://radiopaedia.org/cases/solitary-fibrous-tumor-of-the-dura-4?lang=us\",\"added\":\"2023-09-05T00:14:35.348Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:14:48.711Z\",\"url\":\"https://radiopaedia.org/quizzes/all?lang=us\",\"added\":\"2023-09-05T00:14:35.338Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:14:51.911Z\",\"url\":\"https://radiopaedia.org/?lang=us\",\"added\":\"2023-09-05T00:14:35.340Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:14:35.414Z\",\"url\":\"https://radiopaedia.org/edits?lang=us\",\"added\":\"2023-09-05T00:14:35.335Z\",\"depth\":1}"]}}
{"logLevel":"info","timestamp":"2023-09-05T00:14:58.844Z","context":"general","message":"Awaiting page load","details":{"page":"https://radiopaedia.org/cases/solitary-fibrous-tumor-of-the-dura-4?lang=us","workerid":5}}
{"logLevel":"error","timestamp":"2023-09-05T00:14:59.358Z","context":"general","message":"Page Load Error, skipping page","details":{"statusCode":429,"page":"https://radiopaedia.org/go-ad-free","workerid":3}}
{"logLevel":"error","timestamp":"2023-09-05T00:14:59.358Z","context":"general","message":"Page Load Error, skipping page","details":{"msg":"Page https://radiopaedia.org/go-ad-free returned status code 429","page":"https://radiopaedia.org/go-ad-free","workerid":3}}
{"logLevel":"error","timestamp":"2023-09-05T00:14:59.358Z","context":"worker","message":"Worker Exception","details":{"type":"exception","message":"Page https://radiopaedia.org/go-ad-free returned status code 429","stack":"Error: Page https://radiopaedia.org/go-ad-free returned status code 429\n    at Crawler.loadPage (file:///app/crawler.js:1083:17)\n    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n    at async Crawler.default [as driver] (file:///app/defaultDriver.js:3:3)\n    at async Crawler.crawlPage (file:///app/crawler.js:451:5)\n    at async PageWorker.timedCrawlPage (file:///app/util/worker.js:165:7)\n    at async PageWorker.runLoop (file:///app/util/worker.js:206:9)\n    at async PageWorker.run (file:///app/util/worker.js:187:7)\n    at async Promise.allSettled (index 3)\n    at async Crawler.crawl (file:///app/crawler.js:793:5)\n    at async Crawler.run (file:///app/crawler.js:311:7)","page":"https://radiopaedia.org/go-ad-free","workerid":3}}
{"logLevel":"warn","timestamp":"2023-09-05T00:14:59.359Z","context":"pageStatus","message":"Page Load Failed","details":{"loadState":1,"page":"https://radiopaedia.org/go-ad-free","workerid":3}}
{"logLevel":"info","timestamp":"2023-09-05T00:14:59.382Z","context":"worker","message":"Starting page","details":{"workerid":3,"page":"https://radiopaedia.org/users/jose-roberto-montanez-sauceda?lang=us"}}
{"logLevel":"info","timestamp":"2023-09-05T00:14:59.383Z","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":10,"total":410,"pending":6,"failed":2,"limit":{"max":0,"hit":false},"pendingPages":["{\"seedId\":0,\"started\":\"2023-09-05T00:14:58.308Z\",\"url\":\"https://radiopaedia.org/about\",\"added\":\"2023-09-05T00:14:35.347Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:14:58.691Z\",\"url\":\"https://radiopaedia.org/cases/solitary-fibrous-tumor-of-the-dura-4?lang=us\",\"added\":\"2023-09-05T00:14:35.348Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:14:59.381Z\",\"url\":\"https://radiopaedia.org/users/jose-roberto-montanez-sauceda?lang=us\",\"added\":\"2023-09-05T00:14:35.348Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:14:48.711Z\",\"url\":\"https://radiopaedia.org/quizzes/all?lang=us\",\"added\":\"2023-09-05T00:14:35.338Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:14:51.911Z\",\"url\":\"https://radiopaedia.org/?lang=us\",\"added\":\"2023-09-05T00:14:35.340Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:14:35.414Z\",\"url\":\"https://radiopaedia.org/edits?lang=us\",\"added\":\"2023-09-05T00:14:35.335Z\",\"depth\":1}"]}}
{"logLevel":"info","timestamp":"2023-09-05T00:14:59.561Z","context":"general","message":"Awaiting page load","details":{"page":"https://radiopaedia.org/users/jose-roberto-montanez-sauceda?lang=us","workerid":3}}
{"logLevel":"error","timestamp":"2023-09-05T00:15:00.023Z","context":"general","message":"Page Load Error, skipping page","details":{"statusCode":429,"page":"https://radiopaedia.org/cases/solitary-fibrous-tumor-of-the-dura-4?lang=us","workerid":5}}
{"logLevel":"error","timestamp":"2023-09-05T00:15:00.024Z","context":"general","message":"Page Load Error, skipping page","details":{"msg":"Page https://radiopaedia.org/cases/solitary-fibrous-tumor-of-the-dura-4?lang=us returned status code 429","page":"https://radiopaedia.org/cases/solitary-fibrous-tumor-of-the-dura-4?lang=us","workerid":5}}
{"logLevel":"error","timestamp":"2023-09-05T00:15:00.024Z","context":"worker","message":"Worker Exception","details":{"type":"exception","message":"Page https://radiopaedia.org/cases/solitary-fibrous-tumor-of-the-dura-4?lang=us returned status code 429","stack":"Error: Page https://radiopaedia.org/cases/solitary-fibrous-tumor-of-the-dura-4?lang=us returned status code 429\n    at Crawler.loadPage (file:///app/crawler.js:1083:17)\n    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n    at async Crawler.default [as driver] (file:///app/defaultDriver.js:3:3)\n    at async Crawler.crawlPage (file:///app/crawler.js:451:5)\n    at async PageWorker.timedCrawlPage (file:///app/util/worker.js:165:7)\n    at async PageWorker.runLoop (file:///app/util/worker.js:206:9)\n    at async PageWorker.run (file:///app/util/worker.js:187:7)\n    at async Promise.allSettled (index 5)\n    at async Crawler.crawl (file:///app/crawler.js:793:5)\n    at async Crawler.run (file:///app/crawler.js:311:7)","page":"https://radiopaedia.org/cases/solitary-fibrous-tumor-of-the-dura-4?lang=us","workerid":5}}
{"logLevel":"warn","timestamp":"2023-09-05T00:15:00.027Z","context":"pageStatus","message":"Page Load Failed","details":{"loadState":1,"page":"https://radiopaedia.org/cases/solitary-fibrous-tumor-of-the-dura-4?lang=us","workerid":5}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:00.054Z","context":"worker","message":"Starting page","details":{"workerid":5,"page":"https://radiopaedia.org/articles/solitary-fibrous-tumour-of-the-dura?lang=us"}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:00.055Z","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":11,"total":410,"pending":6,"failed":3,"limit":{"max":0,"hit":false},"pendingPages":["{\"seedId\":0,\"started\":\"2023-09-05T00:14:58.308Z\",\"url\":\"https://radiopaedia.org/about\",\"added\":\"2023-09-05T00:14:35.347Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:14:59.381Z\",\"url\":\"https://radiopaedia.org/users/jose-roberto-montanez-sauceda?lang=us\",\"added\":\"2023-09-05T00:14:35.348Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:14:48.711Z\",\"url\":\"https://radiopaedia.org/quizzes/all?lang=us\",\"added\":\"2023-09-05T00:14:35.338Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:00.054Z\",\"url\":\"https://radiopaedia.org/articles/solitary-fibrous-tumour-of-the-dura?lang=us\",\"added\":\"2023-09-05T00:14:35.349Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:14:51.911Z\",\"url\":\"https://radiopaedia.org/?lang=us\",\"added\":\"2023-09-05T00:14:35.340Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:14:35.414Z\",\"url\":\"https://radiopaedia.org/edits?lang=us\",\"added\":\"2023-09-05T00:14:35.335Z\",\"depth\":1}"]}}
{"logLevel":"error","timestamp":"2023-09-05T00:15:00.083Z","context":"general","message":"Page Load Error, skipping page","details":{"statusCode":429,"page":"https://radiopaedia.org/about","workerid":4}}
{"logLevel":"error","timestamp":"2023-09-05T00:15:00.083Z","context":"general","message":"Page Load Error, skipping page","details":{"msg":"Page https://radiopaedia.org/about returned status code 429","page":"https://radiopaedia.org/about","workerid":4}}
{"logLevel":"error","timestamp":"2023-09-05T00:15:00.084Z","context":"worker","message":"Worker Exception","details":{"type":"exception","message":"Page https://radiopaedia.org/about returned status code 429","stack":"Error: Page https://radiopaedia.org/about returned status code 429\n    at Crawler.loadPage (file:///app/crawler.js:1083:17)\n    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n    at async Crawler.default [as driver] (file:///app/defaultDriver.js:3:3)\n    at async Crawler.crawlPage (file:///app/crawler.js:451:5)\n    at async PageWorker.timedCrawlPage (file:///app/util/worker.js:165:7)\n    at async PageWorker.runLoop (file:///app/util/worker.js:206:9)\n    at async PageWorker.run (file:///app/util/worker.js:187:7)\n    at async Promise.allSettled (index 4)\n    at async Crawler.crawl (file:///app/crawler.js:793:5)\n    at async Crawler.run (file:///app/crawler.js:311:7)","page":"https://radiopaedia.org/about","workerid":4}}
{"logLevel":"warn","timestamp":"2023-09-05T00:15:00.085Z","context":"pageStatus","message":"Page Load Failed","details":{"loadState":1,"page":"https://radiopaedia.org/about","workerid":4}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:00.103Z","context":"worker","message":"Starting page","details":{"workerid":4,"page":"https://radiopaedia.org/feature_images/previous?lang=us"}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:00.104Z","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":12,"total":410,"pending":6,"failed":4,"limit":{"max":0,"hit":false},"pendingPages":["{\"seedId\":0,\"started\":\"2023-09-05T00:15:00.103Z\",\"url\":\"https://radiopaedia.org/feature_images/previous?lang=us\",\"added\":\"2023-09-05T00:14:35.349Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:14:59.381Z\",\"url\":\"https://radiopaedia.org/users/jose-roberto-montanez-sauceda?lang=us\",\"added\":\"2023-09-05T00:14:35.348Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:14:48.711Z\",\"url\":\"https://radiopaedia.org/quizzes/all?lang=us\",\"added\":\"2023-09-05T00:14:35.338Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:00.054Z\",\"url\":\"https://radiopaedia.org/articles/solitary-fibrous-tumour-of-the-dura?lang=us\",\"added\":\"2023-09-05T00:14:35.349Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:14:51.911Z\",\"url\":\"https://radiopaedia.org/?lang=us\",\"added\":\"2023-09-05T00:14:35.340Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:14:35.414Z\",\"url\":\"https://radiopaedia.org/edits?lang=us\",\"added\":\"2023-09-05T00:14:35.335Z\",\"depth\":1}"]}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:00.249Z","context":"general","message":"Awaiting page load","details":{"page":"https://radiopaedia.org/articles/solitary-fibrous-tumour-of-the-dura?lang=us","workerid":5}}
{"logLevel":"warn","timestamp":"2023-09-05T00:15:00.264Z","context":"general","message":"Invalid Page - URL must start with http:// or https://","details":{"url":"javascript:;","page":"https://radiopaedia.org/edits?lang=us","workerid":0}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:00.272Z","context":"general","message":"Awaiting page load","details":{"page":"https://radiopaedia.org/feature_images/previous?lang=us","workerid":4}}
{"logLevel":"error","timestamp":"2023-09-05T00:15:00.288Z","context":"general","message":"Page Load Error, skipping page","details":{"statusCode":429,"page":"https://radiopaedia.org/users/jose-roberto-montanez-sauceda?lang=us","workerid":3}}
{"logLevel":"error","timestamp":"2023-09-05T00:15:00.288Z","context":"general","message":"Page Load Error, skipping page","details":{"msg":"Page https://radiopaedia.org/users/jose-roberto-montanez-sauceda?lang=us returned status code 429","page":"https://radiopaedia.org/users/jose-roberto-montanez-sauceda?lang=us","workerid":3}}
{"logLevel":"error","timestamp":"2023-09-05T00:15:00.288Z","context":"worker","message":"Worker Exception","details":{"type":"exception","message":"Page https://radiopaedia.org/users/jose-roberto-montanez-sauceda?lang=us returned status code 429","stack":"Error: Page https://radiopaedia.org/users/jose-roberto-montanez-sauceda?lang=us returned status code 429\n    at Crawler.loadPage (file:///app/crawler.js:1083:17)\n    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n    at async Crawler.default [as driver] (file:///app/defaultDriver.js:3:3)\n    at async Crawler.crawlPage (file:///app/crawler.js:451:5)\n    at async PageWorker.timedCrawlPage (file:///app/util/worker.js:165:7)\n    at async PageWorker.runLoop (file:///app/util/worker.js:206:9)\n    at async PageWorker.run (file:///app/util/worker.js:187:7)\n    at async Promise.allSettled (index 3)\n    at async Crawler.crawl (file:///app/crawler.js:793:5)\n    at async Crawler.run (file:///app/crawler.js:311:7)","page":"https://radiopaedia.org/users/jose-roberto-montanez-sauceda?lang=us","workerid":3}}
{"logLevel":"warn","timestamp":"2023-09-05T00:15:00.289Z","context":"pageStatus","message":"Page Load Failed","details":{"loadState":1,"page":"https://radiopaedia.org/users/jose-roberto-montanez-sauceda?lang=us","workerid":3}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:00.308Z","context":"worker","message":"Starting page","details":{"workerid":3,"page":"https://radiopaedia.org/articles/general-overview-of-radiopaediaorg"}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:00.310Z","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":13,"total":437,"pending":6,"failed":5,"limit":{"max":0,"hit":false},"pendingPages":["{\"seedId\":0,\"started\":\"2023-09-05T00:15:00.103Z\",\"url\":\"https://radiopaedia.org/feature_images/previous?lang=us\",\"added\":\"2023-09-05T00:14:35.349Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:00.308Z\",\"url\":\"https://radiopaedia.org/articles/general-overview-of-radiopaediaorg\",\"added\":\"2023-09-05T00:14:35.350Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:14:48.711Z\",\"url\":\"https://radiopaedia.org/quizzes/all?lang=us\",\"added\":\"2023-09-05T00:14:35.338Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:00.054Z\",\"url\":\"https://radiopaedia.org/articles/solitary-fibrous-tumour-of-the-dura?lang=us\",\"added\":\"2023-09-05T00:14:35.349Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:14:51.911Z\",\"url\":\"https://radiopaedia.org/?lang=us\",\"added\":\"2023-09-05T00:14:35.340Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:14:35.414Z\",\"url\":\"https://radiopaedia.org/edits?lang=us\",\"added\":\"2023-09-05T00:14:35.335Z\",\"depth\":1}"]}}
{"logLevel":"warn","timestamp":"2023-09-05T00:15:00.319Z","context":"general","message":"Invalid Page - URL must start with http:// or https://","details":{"url":"javascript:;","page":"https://radiopaedia.org/edits?lang=us","workerid":0}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:00.355Z","context":"behavior","message":"Running behaviors","details":{"frames":1,"frameUrls":["https://radiopaedia.org/edits?lang=us"],"page":"https://radiopaedia.org/edits?lang=us","workerid":0}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:00.355Z","context":"behavior","message":"Run Script Started","details":{"frameUrl":"https://radiopaedia.org/edits?lang=us","page":"https://radiopaedia.org/edits?lang=us","workerid":0}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:00.357Z","context":"behavior","message":"Run Script Finished","details":{"frameUrl":"https://radiopaedia.org/edits?lang=us","page":"https://radiopaedia.org/edits?lang=us","workerid":0}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:00.357Z","context":"behavior","message":"Behaviors finished","details":{"finished":1,"page":"https://radiopaedia.org/edits?lang=us","workerid":0}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:00.358Z","context":"pageStatus","message":"Page Finished","details":{"loadState":4,"page":"https://radiopaedia.org/edits?lang=us","workerid":0}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:00.379Z","context":"worker","message":"Starting page","details":{"workerid":0,"page":"https://radiopaedia.org/courses/radiopaedia-2023-virtual-conference"}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:00.380Z","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":14,"total":494,"pending":6,"failed":5,"limit":{"max":0,"hit":false},"pendingPages":["{\"seedId\":0,\"started\":\"2023-09-05T00:15:00.103Z\",\"url\":\"https://radiopaedia.org/feature_images/previous?lang=us\",\"added\":\"2023-09-05T00:14:35.349Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:00.378Z\",\"url\":\"https://radiopaedia.org/courses/radiopaedia-2023-virtual-conference\",\"added\":\"2023-09-05T00:14:35.350Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:00.308Z\",\"url\":\"https://radiopaedia.org/articles/general-overview-of-radiopaediaorg\",\"added\":\"2023-09-05T00:14:35.350Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:14:48.711Z\",\"url\":\"https://radiopaedia.org/quizzes/all?lang=us\",\"added\":\"2023-09-05T00:14:35.338Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:00.054Z\",\"url\":\"https://radiopaedia.org/articles/solitary-fibrous-tumour-of-the-dura?lang=us\",\"added\":\"2023-09-05T00:14:35.349Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:14:51.911Z\",\"url\":\"https://radiopaedia.org/?lang=us\",\"added\":\"2023-09-05T00:14:35.340Z\",\"depth\":1}"]}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:00.472Z","context":"general","message":"Awaiting page load","details":{"page":"https://radiopaedia.org/articles/general-overview-of-radiopaediaorg","workerid":3}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:00.510Z","context":"general","message":"Awaiting page load","details":{"page":"https://radiopaedia.org/courses/radiopaedia-2023-virtual-conference","workerid":0}}
{"logLevel":"error","timestamp":"2023-09-05T00:15:01.069Z","context":"general","message":"Page Load Error, skipping page","details":{"statusCode":429,"page":"https://radiopaedia.org/articles/solitary-fibrous-tumour-of-the-dura?lang=us","workerid":5}}
{"logLevel":"error","timestamp":"2023-09-05T00:15:01.069Z","context":"general","message":"Page Load Error, skipping page","details":{"msg":"Page https://radiopaedia.org/articles/solitary-fibrous-tumour-of-the-dura?lang=us returned status code 429","page":"https://radiopaedia.org/articles/solitary-fibrous-tumour-of-the-dura?lang=us","workerid":5}}
{"logLevel":"error","timestamp":"2023-09-05T00:15:01.069Z","context":"worker","message":"Worker Exception","details":{"type":"exception","message":"Page https://radiopaedia.org/articles/solitary-fibrous-tumour-of-the-dura?lang=us returned status code 429","stack":"Error: Page https://radiopaedia.org/articles/solitary-fibrous-tumour-of-the-dura?lang=us returned status code 429\n    at Crawler.loadPage (file:///app/crawler.js:1083:17)\n    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n    at async Crawler.default [as driver] (file:///app/defaultDriver.js:3:3)\n    at async Crawler.crawlPage (file:///app/crawler.js:451:5)\n    at async PageWorker.timedCrawlPage (file:///app/util/worker.js:165:7)\n    at async PageWorker.runLoop (file:///app/util/worker.js:206:9)\n    at async PageWorker.run (file:///app/util/worker.js:187:7)\n    at async Promise.allSettled (index 5)\n    at async Crawler.crawl (file:///app/crawler.js:793:5)\n    at async Crawler.run (file:///app/crawler.js:311:7)","page":"https://radiopaedia.org/articles/solitary-fibrous-tumour-of-the-dura?lang=us","workerid":5}}
{"logLevel":"warn","timestamp":"2023-09-05T00:15:01.070Z","context":"pageStatus","message":"Page Load Failed","details":{"loadState":1,"page":"https://radiopaedia.org/articles/solitary-fibrous-tumour-of-the-dura?lang=us","workerid":5}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:01.072Z","context":"behavior","message":"Running behaviors","details":{"frames":1,"frameUrls":["https://radiopaedia.org/?lang=us"],"page":"https://radiopaedia.org/?lang=us","workerid":1}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:01.072Z","context":"behavior","message":"Run Script Started","details":{"frameUrl":"https://radiopaedia.org/?lang=us","page":"https://radiopaedia.org/?lang=us","workerid":1}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:01.074Z","context":"behavior","message":"Run Script Finished","details":{"frameUrl":"https://radiopaedia.org/?lang=us","page":"https://radiopaedia.org/?lang=us","workerid":1}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:01.076Z","context":"behavior","message":"Behaviors finished","details":{"finished":1,"page":"https://radiopaedia.org/?lang=us","workerid":1}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:01.077Z","context":"pageStatus","message":"Page Finished","details":{"loadState":4,"page":"https://radiopaedia.org/?lang=us","workerid":1}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:01.127Z","context":"worker","message":"Starting page","details":{"workerid":5,"page":"https://radiopaedia.org/podcast"}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:01.128Z","context":"worker","message":"Starting page","details":{"workerid":1,"page":"https://radiopaedia.org/articles/playlists-1"}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:01.129Z","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":16,"total":494,"pending":6,"failed":6,"limit":{"max":0,"hit":false},"pendingPages":["{\"seedId\":0,\"started\":\"2023-09-05T00:15:01.127Z\",\"url\":\"https://radiopaedia.org/podcast\",\"added\":\"2023-09-05T00:14:35.350Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:01.127Z\",\"url\":\"https://radiopaedia.org/articles/playlists-1\",\"added\":\"2023-09-05T00:14:35.351Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:00.103Z\",\"url\":\"https://radiopaedia.org/feature_images/previous?lang=us\",\"added\":\"2023-09-05T00:14:35.349Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:00.378Z\",\"url\":\"https://radiopaedia.org/courses/radiopaedia-2023-virtual-conference\",\"added\":\"2023-09-05T00:14:35.350Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:00.308Z\",\"url\":\"https://radiopaedia.org/articles/general-overview-of-radiopaediaorg\",\"added\":\"2023-09-05T00:14:35.350Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:14:48.711Z\",\"url\":\"https://radiopaedia.org/quizzes/all?lang=us\",\"added\":\"2023-09-05T00:14:35.338Z\",\"depth\":1}"]}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:01.129Z","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":16,"total":494,"pending":6,"failed":6,"limit":{"max":0,"hit":false},"pendingPages":["{\"seedId\":0,\"started\":\"2023-09-05T00:15:01.127Z\",\"url\":\"https://radiopaedia.org/podcast\",\"added\":\"2023-09-05T00:14:35.350Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:01.127Z\",\"url\":\"https://radiopaedia.org/articles/playlists-1\",\"added\":\"2023-09-05T00:14:35.351Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:00.103Z\",\"url\":\"https://radiopaedia.org/feature_images/previous?lang=us\",\"added\":\"2023-09-05T00:14:35.349Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:00.378Z\",\"url\":\"https://radiopaedia.org/courses/radiopaedia-2023-virtual-conference\",\"added\":\"2023-09-05T00:14:35.350Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:00.308Z\",\"url\":\"https://radiopaedia.org/articles/general-overview-of-radiopaediaorg\",\"added\":\"2023-09-05T00:14:35.350Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:14:48.711Z\",\"url\":\"https://radiopaedia.org/quizzes/all?lang=us\",\"added\":\"2023-09-05T00:14:35.338Z\",\"depth\":1}"]}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:01.241Z","context":"general","message":"Awaiting page load","details":{"page":"https://radiopaedia.org/podcast","workerid":5}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:01.302Z","context":"general","message":"Awaiting page load","details":{"page":"https://radiopaedia.org/articles/playlists-1","workerid":1}}
{"logLevel":"warn","timestamp":"2023-09-05T00:15:01.504Z","context":"general","message":"Invalid Page - URL must start with http:// or https://","details":{"url":"javascript:;","page":"https://radiopaedia.org/quizzes/all?lang=us","workerid":2}}
{"logLevel":"warn","timestamp":"2023-09-05T00:15:01.549Z","context":"general","message":"Invalid Page - URL must start with http:// or https://","details":{"url":"javascript:;","page":"https://radiopaedia.org/quizzes/all?lang=us","workerid":2}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:01.678Z","context":"behavior","message":"Running behaviors","details":{"frames":1,"frameUrls":["https://radiopaedia.org/quizzes/all?lang=us"],"page":"https://radiopaedia.org/quizzes/all?lang=us","workerid":2}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:01.679Z","context":"behavior","message":"Run Script Started","details":{"frameUrl":"https://radiopaedia.org/quizzes/all?lang=us","page":"https://radiopaedia.org/quizzes/all?lang=us","workerid":2}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:01.692Z","context":"behavior","message":"Run Script Finished","details":{"frameUrl":"https://radiopaedia.org/quizzes/all?lang=us","page":"https://radiopaedia.org/quizzes/all?lang=us","workerid":2}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:01.693Z","context":"behavior","message":"Behaviors finished","details":{"finished":1,"page":"https://radiopaedia.org/quizzes/all?lang=us","workerid":2}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:01.694Z","context":"pageStatus","message":"Page Finished","details":{"loadState":4,"page":"https://radiopaedia.org/quizzes/all?lang=us","workerid":2}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:01.732Z","context":"worker","message":"Starting page","details":{"workerid":2,"page":"https://radiopaedia.org/courses/editing-radiopaedia-articles"}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:01.733Z","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":17,"total":546,"pending":6,"failed":6,"limit":{"max":0,"hit":false},"pendingPages":["{\"seedId\":0,\"started\":\"2023-09-05T00:15:01.127Z\",\"url\":\"https://radiopaedia.org/podcast\",\"added\":\"2023-09-05T00:14:35.350Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:01.732Z\",\"url\":\"https://radiopaedia.org/courses/editing-radiopaedia-articles\",\"added\":\"2023-09-05T00:14:35.351Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:01.127Z\",\"url\":\"https://radiopaedia.org/articles/playlists-1\",\"added\":\"2023-09-05T00:14:35.351Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:00.103Z\",\"url\":\"https://radiopaedia.org/feature_images/previous?lang=us\",\"added\":\"2023-09-05T00:14:35.349Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:00.378Z\",\"url\":\"https://radiopaedia.org/courses/radiopaedia-2023-virtual-conference\",\"added\":\"2023-09-05T00:14:35.350Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:00.308Z\",\"url\":\"https://radiopaedia.org/articles/general-overview-of-radiopaediaorg\",\"added\":\"2023-09-05T00:14:35.350Z\",\"depth\":1}"]}}
{"logLevel":"error","timestamp":"2023-09-05T00:15:01.974Z","context":"general","message":"Page Load Error, skipping page","details":{"statusCode":429,"page":"https://radiopaedia.org/courses/radiopaedia-2023-virtual-conference","workerid":0}}
{"logLevel":"error","timestamp":"2023-09-05T00:15:01.975Z","context":"general","message":"Page Load Error, skipping page","details":{"msg":"Page https://radiopaedia.org/courses/radiopaedia-2023-virtual-conference returned status code 429","page":"https://radiopaedia.org/courses/radiopaedia-2023-virtual-conference","workerid":0}}
{"logLevel":"error","timestamp":"2023-09-05T00:15:01.975Z","context":"worker","message":"Worker Exception","details":{"type":"exception","message":"Page https://radiopaedia.org/courses/radiopaedia-2023-virtual-conference returned status code 429","stack":"Error: Page https://radiopaedia.org/courses/radiopaedia-2023-virtual-conference returned status code 429\n    at Crawler.loadPage (file:///app/crawler.js:1083:17)\n    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n    at async Crawler.default [as driver] (file:///app/defaultDriver.js:3:3)\n    at async Crawler.crawlPage (file:///app/crawler.js:451:5)\n    at async PageWorker.timedCrawlPage (file:///app/util/worker.js:165:7)\n    at async PageWorker.runLoop (file:///app/util/worker.js:206:9)\n    at async PageWorker.run (file:///app/util/worker.js:187:7)\n    at async Promise.allSettled (index 0)\n    at async Crawler.crawl (file:///app/crawler.js:793:5)\n    at async Crawler.run (file:///app/crawler.js:311:7)","page":"https://radiopaedia.org/courses/radiopaedia-2023-virtual-conference","workerid":0}}
{"logLevel":"warn","timestamp":"2023-09-05T00:15:01.975Z","context":"pageStatus","message":"Page Load Failed","details":{"loadState":1,"page":"https://radiopaedia.org/courses/radiopaedia-2023-virtual-conference","workerid":0}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:02.005Z","context":"worker","message":"Starting page","details":{"workerid":0,"page":"https://radiopaedia.org/impact"}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:02.006Z","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":18,"total":546,"pending":6,"failed":7,"limit":{"max":0,"hit":false},"pendingPages":["{\"seedId\":0,\"started\":\"2023-09-05T00:15:01.127Z\",\"url\":\"https://radiopaedia.org/podcast\",\"added\":\"2023-09-05T00:14:35.350Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:01.732Z\",\"url\":\"https://radiopaedia.org/courses/editing-radiopaedia-articles\",\"added\":\"2023-09-05T00:14:35.351Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:01.127Z\",\"url\":\"https://radiopaedia.org/articles/playlists-1\",\"added\":\"2023-09-05T00:14:35.351Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:00.103Z\",\"url\":\"https://radiopaedia.org/feature_images/previous?lang=us\",\"added\":\"2023-09-05T00:14:35.349Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:02.005Z\",\"url\":\"https://radiopaedia.org/impact\",\"added\":\"2023-09-05T00:14:35.351Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:00.308Z\",\"url\":\"https://radiopaedia.org/articles/general-overview-of-radiopaediaorg\",\"added\":\"2023-09-05T00:14:35.350Z\",\"depth\":1}"]}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:02.177Z","context":"general","message":"Awaiting page load","details":{"page":"https://radiopaedia.org/courses/editing-radiopaedia-articles","workerid":2}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:02.270Z","context":"general","message":"Awaiting page load","details":{"page":"https://radiopaedia.org/impact","workerid":0}}
{"logLevel":"error","timestamp":"2023-09-05T00:15:02.322Z","context":"general","message":"Page Load Error, skipping page","details":{"statusCode":429,"page":"https://radiopaedia.org/feature_images/previous?lang=us","workerid":4}}
{"logLevel":"error","timestamp":"2023-09-05T00:15:02.323Z","context":"general","message":"Page Load Error, skipping page","details":{"msg":"Page https://radiopaedia.org/feature_images/previous?lang=us returned status code 429","page":"https://radiopaedia.org/feature_images/previous?lang=us","workerid":4}}
{"logLevel":"error","timestamp":"2023-09-05T00:15:02.323Z","context":"worker","message":"Worker Exception","details":{"type":"exception","message":"Page https://radiopaedia.org/feature_images/previous?lang=us returned status code 429","stack":"Error: Page https://radiopaedia.org/feature_images/previous?lang=us returned status code 429\n    at Crawler.loadPage (file:///app/crawler.js:1083:17)\n    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n    at async Crawler.default [as driver] (file:///app/defaultDriver.js:3:3)\n    at async Crawler.crawlPage (file:///app/crawler.js:451:5)\n    at async PageWorker.timedCrawlPage (file:///app/util/worker.js:165:7)\n    at async PageWorker.runLoop (file:///app/util/worker.js:206:9)\n    at async PageWorker.run (file:///app/util/worker.js:187:7)\n    at async Promise.allSettled (index 4)\n    at async Crawler.crawl (file:///app/crawler.js:793:5)\n    at async Crawler.run (file:///app/crawler.js:311:7)","page":"https://radiopaedia.org/feature_images/previous?lang=us","workerid":4}}
{"logLevel":"warn","timestamp":"2023-09-05T00:15:02.325Z","context":"pageStatus","message":"Page Load Failed","details":{"loadState":1,"page":"https://radiopaedia.org/feature_images/previous?lang=us","workerid":4}}
{"logLevel":"error","timestamp":"2023-09-05T00:15:02.374Z","context":"general","message":"Page Load Error, skipping page","details":{"statusCode":429,"page":"https://radiopaedia.org/podcast","workerid":5}}
{"logLevel":"error","timestamp":"2023-09-05T00:15:02.374Z","context":"general","message":"Page Load Error, skipping page","details":{"msg":"Page https://radiopaedia.org/podcast returned status code 429","page":"https://radiopaedia.org/podcast","workerid":5}}
{"logLevel":"error","timestamp":"2023-09-05T00:15:02.374Z","context":"worker","message":"Worker Exception","details":{"type":"exception","message":"Page https://radiopaedia.org/podcast returned status code 429","stack":"Error: Page https://radiopaedia.org/podcast returned status code 429\n    at Crawler.loadPage (file:///app/crawler.js:1083:17)\n    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n    at async Crawler.default [as driver] (file:///app/defaultDriver.js:3:3)\n    at async Crawler.crawlPage (file:///app/crawler.js:451:5)\n    at async PageWorker.timedCrawlPage (file:///app/util/worker.js:165:7)\n    at async PageWorker.runLoop (file:///app/util/worker.js:206:9)\n    at async PageWorker.run (file:///app/util/worker.js:187:7)\n    at async Promise.allSettled (index 5)\n    at async Crawler.crawl (file:///app/crawler.js:793:5)\n    at async Crawler.run (file:///app/crawler.js:311:7)","page":"https://radiopaedia.org/podcast","workerid":5}}
{"logLevel":"warn","timestamp":"2023-09-05T00:15:02.375Z","context":"pageStatus","message":"Page Load Failed","details":{"loadState":1,"page":"https://radiopaedia.org/podcast","workerid":5}}
{"logLevel":"error","timestamp":"2023-09-05T00:15:02.639Z","context":"general","message":"Page Load Error, skipping page","details":{"statusCode":429,"page":"https://radiopaedia.org/articles/playlists-1","workerid":1}}
{"logLevel":"error","timestamp":"2023-09-05T00:15:02.642Z","context":"general","message":"Page Load Error, skipping page","details":{"msg":"Page https://radiopaedia.org/articles/playlists-1 returned status code 429","page":"https://radiopaedia.org/articles/playlists-1","workerid":1}}
{"logLevel":"error","timestamp":"2023-09-05T00:15:02.642Z","context":"worker","message":"Worker Exception","details":{"type":"exception","message":"Page https://radiopaedia.org/articles/playlists-1 returned status code 429","stack":"Error: Page https://radiopaedia.org/articles/playlists-1 returned status code 429\n    at Crawler.loadPage (file:///app/crawler.js:1083:17)\n    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n    at async Crawler.default [as driver] (file:///app/defaultDriver.js:3:3)\n    at async Crawler.crawlPage (file:///app/crawler.js:451:5)\n    at async PageWorker.timedCrawlPage (file:///app/util/worker.js:165:7)\n    at async PageWorker.runLoop (file:///app/util/worker.js:206:9)\n    at async PageWorker.run (file:///app/util/worker.js:187:7)\n    at async Promise.allSettled (index 1)\n    at async Crawler.crawl (file:///app/crawler.js:793:5)\n    at async Crawler.run (file:///app/crawler.js:311:7)","page":"https://radiopaedia.org/articles/playlists-1","workerid":1}}
{"logLevel":"warn","timestamp":"2023-09-05T00:15:02.644Z","context":"pageStatus","message":"Page Load Failed","details":{"loadState":1,"page":"https://radiopaedia.org/articles/playlists-1","workerid":1}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:02.662Z","context":"worker","message":"Starting page","details":{"workerid":4,"page":"https://radiopaedia.org/courses/help-creating-cases"}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:02.676Z","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":21,"total":546,"pending":5,"failed":10,"limit":{"max":0,"hit":false},"pendingPages":["{\"seedId\":0,\"started\":\"2023-09-05T00:15:02.363Z\",\"url\":\"https://radiopaedia.org/courses/help-creating-cases\",\"added\":\"2023-09-05T00:14:35.352Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:01.732Z\",\"url\":\"https://radiopaedia.org/courses/editing-radiopaedia-articles\",\"added\":\"2023-09-05T00:14:35.351Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:02.005Z\",\"url\":\"https://radiopaedia.org/impact\",\"added\":\"2023-09-05T00:14:35.351Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:00.308Z\",\"url\":\"https://radiopaedia.org/articles/general-overview-of-radiopaediaorg\",\"added\":\"2023-09-05T00:14:35.350Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:02.415Z\",\"url\":\"https://radiopaedia.org/courses/help-multiple-choice-questions\",\"added\":\"2023-09-05T00:14:35.352Z\",\"depth\":1}"]}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:02.679Z","context":"worker","message":"Starting page","details":{"workerid":5,"page":"https://radiopaedia.org/courses/help-multiple-choice-questions"}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:02.684Z","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":21,"total":546,"pending":5,"failed":10,"limit":{"max":0,"hit":false},"pendingPages":["{\"seedId\":0,\"started\":\"2023-09-05T00:15:02.363Z\",\"url\":\"https://radiopaedia.org/courses/help-creating-cases\",\"added\":\"2023-09-05T00:14:35.352Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:01.732Z\",\"url\":\"https://radiopaedia.org/courses/editing-radiopaedia-articles\",\"added\":\"2023-09-05T00:14:35.351Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:02.005Z\",\"url\":\"https://radiopaedia.org/impact\",\"added\":\"2023-09-05T00:14:35.351Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:00.308Z\",\"url\":\"https://radiopaedia.org/articles/general-overview-of-radiopaediaorg\",\"added\":\"2023-09-05T00:14:35.350Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:02.415Z\",\"url\":\"https://radiopaedia.org/courses/help-multiple-choice-questions\",\"added\":\"2023-09-05T00:14:35.352Z\",\"depth\":1}"]}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:02.686Z","context":"worker","message":"Starting page","details":{"workerid":1,"page":"https://radiopaedia.org/peer-review-policy"}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:02.687Z","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":21,"total":546,"pending":6,"failed":10,"limit":{"max":0,"hit":false},"pendingPages":["{\"seedId\":0,\"started\":\"2023-09-05T00:15:02.363Z\",\"url\":\"https://radiopaedia.org/courses/help-creating-cases\",\"added\":\"2023-09-05T00:14:35.352Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:01.732Z\",\"url\":\"https://radiopaedia.org/courses/editing-radiopaedia-articles\",\"added\":\"2023-09-05T00:14:35.351Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:02.005Z\",\"url\":\"https://radiopaedia.org/impact\",\"added\":\"2023-09-05T00:14:35.351Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:00.308Z\",\"url\":\"https://radiopaedia.org/articles/general-overview-of-radiopaediaorg\",\"added\":\"2023-09-05T00:14:35.350Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:02.684Z\",\"url\":\"https://radiopaedia.org/peer-review-policy\",\"added\":\"2023-09-05T00:14:35.352Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:02.415Z\",\"url\":\"https://radiopaedia.org/courses/help-multiple-choice-questions\",\"added\":\"2023-09-05T00:14:35.352Z\",\"depth\":1}"]}}
{"logLevel":"error","timestamp":"2023-09-05T00:15:02.760Z","context":"general","message":"Page Load Error, skipping page","details":{"statusCode":429,"page":"https://radiopaedia.org/articles/general-overview-of-radiopaediaorg","workerid":3}}
{"logLevel":"error","timestamp":"2023-09-05T00:15:02.760Z","context":"general","message":"Page Load Error, skipping page","details":{"msg":"Page https://radiopaedia.org/articles/general-overview-of-radiopaediaorg returned status code 429","page":"https://radiopaedia.org/articles/general-overview-of-radiopaediaorg","workerid":3}}
{"logLevel":"error","timestamp":"2023-09-05T00:15:02.760Z","context":"worker","message":"Worker Exception","details":{"type":"exception","message":"Page https://radiopaedia.org/articles/general-overview-of-radiopaediaorg returned status code 429","stack":"Error: Page https://radiopaedia.org/articles/general-overview-of-radiopaediaorg returned status code 429\n    at Crawler.loadPage (file:///app/crawler.js:1083:17)\n    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n    at async Crawler.default [as driver] (file:///app/defaultDriver.js:3:3)\n    at async Crawler.crawlPage (file:///app/crawler.js:451:5)\n    at async PageWorker.timedCrawlPage (file:///app/util/worker.js:165:7)\n    at async PageWorker.runLoop (file:///app/util/worker.js:206:9)\n    at async PageWorker.run (file:///app/util/worker.js:187:7)\n    at async Promise.allSettled (index 3)\n    at async Crawler.crawl (file:///app/crawler.js:793:5)\n    at async Crawler.run (file:///app/crawler.js:311:7)","page":"https://radiopaedia.org/articles/general-overview-of-radiopaediaorg","workerid":3}}
{"logLevel":"warn","timestamp":"2023-09-05T00:15:02.761Z","context":"pageStatus","message":"Page Load Failed","details":{"loadState":1,"page":"https://radiopaedia.org/articles/general-overview-of-radiopaediaorg","workerid":3}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:02.797Z","context":"worker","message":"Starting page","details":{"workerid":3,"page":"https://radiopaedia.org/continuing-medical-education-cme"}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:02.798Z","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":22,"total":546,"pending":6,"failed":11,"limit":{"max":0,"hit":false},"pendingPages":["{\"seedId\":0,\"started\":\"2023-09-05T00:15:02.363Z\",\"url\":\"https://radiopaedia.org/courses/help-creating-cases\",\"added\":\"2023-09-05T00:14:35.352Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:01.732Z\",\"url\":\"https://radiopaedia.org/courses/editing-radiopaedia-articles\",\"added\":\"2023-09-05T00:14:35.351Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:02.796Z\",\"url\":\"https://radiopaedia.org/continuing-medical-education-cme\",\"added\":\"2023-09-05T00:14:35.353Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:02.005Z\",\"url\":\"https://radiopaedia.org/impact\",\"added\":\"2023-09-05T00:14:35.351Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:02.684Z\",\"url\":\"https://radiopaedia.org/peer-review-policy\",\"added\":\"2023-09-05T00:14:35.352Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:02.415Z\",\"url\":\"https://radiopaedia.org/courses/help-multiple-choice-questions\",\"added\":\"2023-09-05T00:14:35.352Z\",\"depth\":1}"]}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:02.975Z","context":"general","message":"Awaiting page load","details":{"page":"https://radiopaedia.org/courses/help-multiple-choice-questions","workerid":5}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:03.023Z","context":"general","message":"Awaiting page load","details":{"page":"https://radiopaedia.org/peer-review-policy","workerid":1}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:03.026Z","context":"general","message":"Awaiting page load","details":{"page":"https://radiopaedia.org/courses/help-creating-cases","workerid":4}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:03.050Z","context":"general","message":"Awaiting page load","details":{"page":"https://radiopaedia.org/continuing-medical-education-cme","workerid":3}}
{"logLevel":"error","timestamp":"2023-09-05T00:15:04.781Z","context":"general","message":"Page Load Error, skipping page","details":{"statusCode":429,"page":"https://radiopaedia.org/peer-review-policy","workerid":1}}
{"logLevel":"error","timestamp":"2023-09-05T00:15:04.781Z","context":"general","message":"Page Load Error, skipping page","details":{"msg":"Page https://radiopaedia.org/peer-review-policy returned status code 429","page":"https://radiopaedia.org/peer-review-policy","workerid":1}}
{"logLevel":"error","timestamp":"2023-09-05T00:15:04.781Z","context":"worker","message":"Worker Exception","details":{"type":"exception","message":"Page https://radiopaedia.org/peer-review-policy returned status code 429","stack":"Error: Page https://radiopaedia.org/peer-review-policy returned status code 429\n    at Crawler.loadPage (file:///app/crawler.js:1083:17)\n    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n    at async Crawler.default [as driver] (file:///app/defaultDriver.js:3:3)\n    at async Crawler.crawlPage (file:///app/crawler.js:451:5)\n    at async PageWorker.timedCrawlPage (file:///app/util/worker.js:165:7)\n    at async PageWorker.runLoop (file:///app/util/worker.js:206:9)\n    at async PageWorker.run (file:///app/util/worker.js:187:7)\n    at async Promise.allSettled (index 1)\n    at async Crawler.crawl (file:///app/crawler.js:793:5)\n    at async Crawler.run (file:///app/crawler.js:311:7)","page":"https://radiopaedia.org/peer-review-policy","workerid":1}}
{"logLevel":"warn","timestamp":"2023-09-05T00:15:04.782Z","context":"pageStatus","message":"Page Load Failed","details":{"loadState":1,"page":"https://radiopaedia.org/peer-review-policy","workerid":1}}
{"logLevel":"error","timestamp":"2023-09-05T00:15:04.799Z","context":"general","message":"Page Load Error, skipping page","details":{"statusCode":429,"page":"https://radiopaedia.org/impact","workerid":0}}
{"logLevel":"error","timestamp":"2023-09-05T00:15:04.799Z","context":"general","message":"Page Load Error, skipping page","details":{"msg":"Page https://radiopaedia.org/impact returned status code 429","page":"https://radiopaedia.org/impact","workerid":0}}
{"logLevel":"error","timestamp":"2023-09-05T00:15:04.800Z","context":"worker","message":"Worker Exception","details":{"type":"exception","message":"Page https://radiopaedia.org/impact returned status code 429","stack":"Error: Page https://radiopaedia.org/impact returned status code 429\n    at Crawler.loadPage (file:///app/crawler.js:1083:17)\n    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n    at async Crawler.default [as driver] (file:///app/defaultDriver.js:3:3)\n    at async Crawler.crawlPage (file:///app/crawler.js:451:5)\n    at async PageWorker.timedCrawlPage (file:///app/util/worker.js:165:7)\n    at async PageWorker.runLoop (file:///app/util/worker.js:206:9)\n    at async PageWorker.run (file:///app/util/worker.js:187:7)\n    at async Promise.allSettled (index 0)\n    at async Crawler.crawl (file:///app/crawler.js:793:5)\n    at async Crawler.run (file:///app/crawler.js:311:7)","page":"https://radiopaedia.org/impact","workerid":0}}
{"logLevel":"warn","timestamp":"2023-09-05T00:15:04.800Z","context":"pageStatus","message":"Page Load Failed","details":{"loadState":1,"page":"https://radiopaedia.org/impact","workerid":0}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:04.816Z","context":"worker","message":"Starting page","details":{"workerid":1,"page":"https://radiopaedia.org/editors"}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:04.817Z","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":24,"total":546,"pending":5,"failed":13,"limit":{"max":0,"hit":false},"pendingPages":["{\"seedId\":0,\"started\":\"2023-09-05T00:15:04.815Z\",\"url\":\"https://radiopaedia.org/editors\",\"added\":\"2023-09-05T00:14:35.353Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:02.363Z\",\"url\":\"https://radiopaedia.org/courses/help-creating-cases\",\"added\":\"2023-09-05T00:14:35.352Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:01.732Z\",\"url\":\"https://radiopaedia.org/courses/editing-radiopaedia-articles\",\"added\":\"2023-09-05T00:14:35.351Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:02.796Z\",\"url\":\"https://radiopaedia.org/continuing-medical-education-cme\",\"added\":\"2023-09-05T00:14:35.353Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:02.415Z\",\"url\":\"https://radiopaedia.org/courses/help-multiple-choice-questions\",\"added\":\"2023-09-05T00:14:35.352Z\",\"depth\":1}"]}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:04.844Z","context":"worker","message":"Starting page","details":{"workerid":0,"page":"https://radiopaedia.org/radiopaedia-educational-board"}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:04.850Z","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":24,"total":546,"pending":6,"failed":13,"limit":{"max":0,"hit":false},"pendingPages":["{\"seedId\":0,\"started\":\"2023-09-05T00:15:04.815Z\",\"url\":\"https://radiopaedia.org/editors\",\"added\":\"2023-09-05T00:14:35.353Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:02.363Z\",\"url\":\"https://radiopaedia.org/courses/help-creating-cases\",\"added\":\"2023-09-05T00:14:35.352Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:01.732Z\",\"url\":\"https://radiopaedia.org/courses/editing-radiopaedia-articles\",\"added\":\"2023-09-05T00:14:35.351Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:04.843Z\",\"url\":\"https://radiopaedia.org/radiopaedia-educational-board\",\"added\":\"2023-09-05T00:14:35.353Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:02.796Z\",\"url\":\"https://radiopaedia.org/continuing-medical-education-cme\",\"added\":\"2023-09-05T00:14:35.353Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:02.415Z\",\"url\":\"https://radiopaedia.org/courses/help-multiple-choice-questions\",\"added\":\"2023-09-05T00:14:35.352Z\",\"depth\":1}"]}}
{"logLevel":"error","timestamp":"2023-09-05T00:15:04.905Z","context":"general","message":"Page Load Error, skipping page","details":{"statusCode":429,"page":"https://radiopaedia.org/courses/editing-radiopaedia-articles","workerid":2}}
{"logLevel":"error","timestamp":"2023-09-05T00:15:04.906Z","context":"general","message":"Page Load Error, skipping page","details":{"msg":"Page https://radiopaedia.org/courses/editing-radiopaedia-articles returned status code 429","page":"https://radiopaedia.org/courses/editing-radiopaedia-articles","workerid":2}}
{"logLevel":"error","timestamp":"2023-09-05T00:15:04.906Z","context":"worker","message":"Worker Exception","details":{"type":"exception","message":"Page https://radiopaedia.org/courses/editing-radiopaedia-articles returned status code 429","stack":"Error: Page https://radiopaedia.org/courses/editing-radiopaedia-articles returned status code 429\n    at Crawler.loadPage (file:///app/crawler.js:1083:17)\n    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n    at async Crawler.default [as driver] (file:///app/defaultDriver.js:3:3)\n    at async Crawler.crawlPage (file:///app/crawler.js:451:5)\n    at async PageWorker.timedCrawlPage (file:///app/util/worker.js:165:7)\n    at async PageWorker.runLoop (file:///app/util/worker.js:206:9)\n    at async PageWorker.run (file:///app/util/worker.js:187:7)\n    at async Promise.allSettled (index 2)\n    at async Crawler.crawl (file:///app/crawler.js:793:5)\n    at async Crawler.run (file:///app/crawler.js:311:7)","page":"https://radiopaedia.org/courses/editing-radiopaedia-articles","workerid":2}}
{"logLevel":"warn","timestamp":"2023-09-05T00:15:04.907Z","context":"pageStatus","message":"Page Load Failed","details":{"loadState":1,"page":"https://radiopaedia.org/courses/editing-radiopaedia-articles","workerid":2}}

The crawler could be enhanced by:

benoit74 commented 11 months ago

FYI, I finally have a repro of #387, but this is way better handled as stated in this issue:

I'm working on a PR, so you could assign me this issue.