webrecorder / browsertrix-crawler

Run a high-fidelity browser-based web archiving crawler in a single Docker container
https://crawler.docs.browsertrix.com
GNU Affero General Public License v3.0
636 stars 83 forks source link

Browser disconnected (crashed?) #706

Open rgaudin opened 1 week ago

rgaudin commented 1 week ago

This week, with Browsertrix-Crawler 1.3.3 (with warcio.js 2.3.1), I am getting several cases of the following:

I don't know of those connect to each other but This happened on multiple different websites and it happens consistently.

You can try with https://fsfe.org/

{"timestamp":"2024-10-15T08:43:10.788Z","logLevel":"warn","context":"recorder","message":"continueResponse failed","details":{"url":"https://download.fsfe.org/videos/peertube/xs29yhLxSP1uKLYkSeoKKp_720p.mp4"}}
{"timestamp":"2024-10-15T08:43:10.806Z","logLevel":"warn","context":"recorder","message":"continueResponse failed","details":{"url":"https://download.fsfe.org/videos/peertube/xs29yhLxSP1uKLYkSeoKKp_720p.mp4"}}
{"timestamp":"2024-10-15T08:43:10.823Z","logLevel":"warn","context":"recorder","message":"continueResponse failed","details":{"url":"https://download.fsfe.org/videos/peertube/xs29yhLxSP1uKLYkSeoKKp_720p.mp4"}}
{"timestamp":"2024-10-15T08:43:10.844Z","logLevel":"warn","context":"recorder","message":"continueResponse failed","details":{"url":"https://download.fsfe.org/videos/peertube/xs29yhLxSP1uKLYkSeoKKp_720p.mp4"}}
{"timestamp":"2024-10-15T08:43:10.869Z","logLevel":"warn","context":"recorder","message":"continueResponse failed","details":{"url":"https://download.fsfe.org/videos/peertube/xs29yhLxSP1uKLYkSeoKKp_720p.mp4"}}
{"timestamp":"2024-10-15T08:43:10.944Z","logLevel":"warn","context":"recorder","message":"continueResponse failed","details":{"url":"https://download.fsfe.org/videos/peertube/xs29yhLxSP1uKLYkSeoKKp_720p.mp4"}}
{"timestamp":"2024-10-15T08:43:12.199Z","logLevel":"info","context":"behavior","message":"Running behaviors","details":{"frames":1,"frameUrls":["https://fsfe.org/freesoftware/index.en.html"],"page":"https://fsfe.org/freesoftware/index.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:43:12.199Z","logLevel":"info","context":"behavior","message":"Run Script Started","details":{"frameUrl":"https://fsfe.org/freesoftware/index.en.html","page":"https://fsfe.org/freesoftware/index.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:43:12.971Z","logLevel":"info","context":"behavior","message":"Run Script Finished","details":{"frameUrl":"https://fsfe.org/freesoftware/index.en.html","page":"https://fsfe.org/freesoftware/index.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:43:12.972Z","logLevel":"info","context":"behavior","message":"Behaviors finished","details":{"finished":1,"page":"https://fsfe.org/freesoftware/index.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:43:13.972Z","logLevel":"info","context":"pageStatus","message":"Page Finished","details":{"loadState":4,"page":"https://fsfe.org/freesoftware/index.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:43:14.057Z","logLevel":"info","context":"worker","message":"Starting page","details":{"workerid":0,"page":"https://fsfe.org/news/nl/nl-202410.en.html"}}
{"timestamp":"2024-10-15T08:43:14.066Z","logLevel":"info","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":21,"total":682,"pending":1,"failed":0,"limit":{"max":0,"hit":false},"pendingPages":["{\"seedId\":0,\"started\":\"2024-10-15T08:43:14.043Z\",\"extraHops\":0,\"url\":\"https:\\/\\/fsfe.org\\/news\\/nl\\/nl-202410.en.html\",\"added\":\"2024-10-15T08:41:22.878Z\",\"depth\":1}"]}}
{"timestamp":"2024-10-15T08:43:14.316Z","logLevel":"info","context":"general","message":"Awaiting page load","details":{"page":"https://fsfe.org/news/nl/nl-202410.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:43:14.490Z","logLevel":"warn","context":"recorder","message":"continueResponse failed","details":{"url":"https://download.fsfe.org/videos/peertube/xs29yhLxSP1uKLYkSeoKKp_1080p.webm"}}
{"timestamp":"2024-10-15T08:43:16.201Z","logLevel":"info","context":"behavior","message":"Running behaviors","details":{"frames":1,"frameUrls":["https://fsfe.org/news/nl/nl-202410.en.html"],"page":"https://fsfe.org/news/nl/nl-202410.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:43:16.201Z","logLevel":"info","context":"behavior","message":"Run Script Started","details":{"frameUrl":"https://fsfe.org/news/nl/nl-202410.en.html","page":"https://fsfe.org/news/nl/nl-202410.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:43:16.742Z","logLevel":"info","context":"behavior","message":"Run Script Finished","details":{"frameUrl":"https://fsfe.org/news/nl/nl-202410.en.html","page":"https://fsfe.org/news/nl/nl-202410.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:43:16.742Z","logLevel":"info","context":"behavior","message":"Behaviors finished","details":{"finished":1,"page":"https://fsfe.org/news/nl/nl-202410.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:43:17.749Z","logLevel":"info","context":"pageStatus","message":"Page Finished","details":{"loadState":4,"page":"https://fsfe.org/news/nl/nl-202410.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:43:17.824Z","logLevel":"info","context":"worker","message":"Starting page","details":{"workerid":0,"page":"https://fsfe.org/news/2024/news-20240911-01.en.html"}}
{"timestamp":"2024-10-15T08:43:17.828Z","logLevel":"info","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":22,"total":686,"pending":1,"failed":0,"limit":{"max":0,"hit":false},"pendingPages":["{\"seedId\":0,\"started\":\"2024-10-15T08:43:17.821Z\",\"extraHops\":0,\"url\":\"https:\\/\\/fsfe.org\\/news\\/2024\\/news-20240911-01.en.html\",\"added\":\"2024-10-15T08:41:22.880Z\",\"depth\":1}"]}}
{"timestamp":"2024-10-15T08:43:17.944Z","logLevel":"info","context":"general","message":"Awaiting page load","details":{"page":"https://fsfe.org/news/2024/news-20240911-01.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:43:19.407Z","logLevel":"info","context":"behavior","message":"Running behaviors","details":{"frames":1,"frameUrls":["https://fsfe.org/news/2024/news-20240911-01.en.html"],"page":"https://fsfe.org/news/2024/news-20240911-01.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:43:19.407Z","logLevel":"info","context":"behavior","message":"Run Script Started","details":{"frameUrl":"https://fsfe.org/news/2024/news-20240911-01.en.html","page":"https://fsfe.org/news/2024/news-20240911-01.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:43:19.962Z","logLevel":"info","context":"behavior","message":"Run Script Finished","details":{"frameUrl":"https://fsfe.org/news/2024/news-20240911-01.en.html","page":"https://fsfe.org/news/2024/news-20240911-01.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:43:19.963Z","logLevel":"info","context":"behavior","message":"Behaviors finished","details":{"finished":1,"page":"https://fsfe.org/news/2024/news-20240911-01.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:43:20.967Z","logLevel":"info","context":"pageStatus","message":"Page Finished","details":{"loadState":4,"page":"https://fsfe.org/news/2024/news-20240911-01.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:43:20.996Z","logLevel":"info","context":"worker","message":"Starting page","details":{"workerid":0,"page":"https://fsfe.org/news/2024/news-20240812-01.en.html"}}
{"timestamp":"2024-10-15T08:43:20.998Z","logLevel":"info","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":23,"total":686,"pending":1,"failed":0,"limit":{"max":0,"hit":false},"pendingPages":["{\"seedId\":0,\"started\":\"2024-10-15T08:43:20.994Z\",\"extraHops\":0,\"url\":\"https:\\/\\/fsfe.org\\/news\\/2024\\/news-20240812-01.en.html\",\"added\":\"2024-10-15T08:41:22.882Z\",\"depth\":1}"]}}
{"timestamp":"2024-10-15T08:43:21.030Z","logLevel":"info","context":"general","message":"Awaiting page load","details":{"page":"https://fsfe.org/news/2024/news-20240812-01.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:43:22.410Z","logLevel":"info","context":"behavior","message":"Running behaviors","details":{"frames":1,"frameUrls":["https://fsfe.org/news/2024/news-20240812-01.en.html"],"page":"https://fsfe.org/news/2024/news-20240812-01.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:43:22.411Z","logLevel":"info","context":"behavior","message":"Run Script Started","details":{"frameUrl":"https://fsfe.org/news/2024/news-20240812-01.en.html","page":"https://fsfe.org/news/2024/news-20240812-01.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:43:22.950Z","logLevel":"info","context":"behavior","message":"Run Script Finished","details":{"frameUrl":"https://fsfe.org/news/2024/news-20240812-01.en.html","page":"https://fsfe.org/news/2024/news-20240812-01.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:43:22.951Z","logLevel":"info","context":"behavior","message":"Behaviors finished","details":{"finished":1,"page":"https://fsfe.org/news/2024/news-20240812-01.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:43:23.952Z","logLevel":"info","context":"pageStatus","message":"Page Finished","details":{"loadState":4,"page":"https://fsfe.org/news/2024/news-20240812-01.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:43:23.978Z","logLevel":"info","context":"worker","message":"Starting page","details":{"workerid":0,"page":"https://fsfe.org/news/index.en.html"}}
{"timestamp":"2024-10-15T08:43:23.980Z","logLevel":"info","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":24,"total":686,"pending":1,"failed":0,"limit":{"max":0,"hit":false},"pendingPages":["{\"seedId\":0,\"started\":\"2024-10-15T08:43:23.977Z\",\"extraHops\":0,\"url\":\"https:\\/\\/fsfe.org\\/news\\/index.en.html\",\"added\":\"2024-10-15T08:41:22.882Z\",\"depth\":1}"]}}
{"timestamp":"2024-10-15T08:43:24.025Z","logLevel":"info","context":"general","message":"Awaiting page load","details":{"page":"https://fsfe.org/news/index.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:43:28.253Z","logLevel":"info","context":"behavior","message":"Running behaviors","details":{"frames":1,"frameUrls":["https://fsfe.org/news/index.en.html"],"page":"https://fsfe.org/news/index.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:43:28.253Z","logLevel":"info","context":"behavior","message":"Run Script Started","details":{"frameUrl":"https://fsfe.org/news/index.en.html","page":"https://fsfe.org/news/index.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:43:28.883Z","logLevel":"info","context":"behavior","message":"Run Script Finished","details":{"frameUrl":"https://fsfe.org/news/index.en.html","page":"https://fsfe.org/news/index.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:43:28.883Z","logLevel":"info","context":"behavior","message":"Behaviors finished","details":{"finished":1,"page":"https://fsfe.org/news/index.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:43:29.887Z","logLevel":"info","context":"pageStatus","message":"Page Finished","details":{"loadState":4,"page":"https://fsfe.org/news/index.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:43:30.622Z","logLevel":"info","context":"worker","message":"Starting page","details":{"workerid":0,"page":"https://fsfe.org/news/2024/news-20241002-01.en.html"}}
{"timestamp":"2024-10-15T08:43:30.624Z","logLevel":"info","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":25,"total":686,"pending":1,"failed":0,"limit":{"max":0,"hit":false},"pendingPages":["{\"seedId\":0,\"started\":\"2024-10-15T08:43:29.915Z\",\"extraHops\":0,\"url\":\"https:\\/\\/fsfe.org\\/news\\/2024\\/news-20241002-01.en.html\",\"added\":\"2024-10-15T08:41:22.884Z\",\"depth\":1}"]}}
{"timestamp":"2024-10-15T08:43:30.751Z","logLevel":"info","context":"general","message":"Awaiting page load","details":{"page":"https://fsfe.org/news/2024/news-20241002-01.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:43:32.667Z","logLevel":"info","context":"behavior","message":"Running behaviors","details":{"frames":1,"frameUrls":["https://fsfe.org/news/2024/news-20241002-01.en.html"],"page":"https://fsfe.org/news/2024/news-20241002-01.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:43:32.668Z","logLevel":"info","context":"behavior","message":"Run Script Started","details":{"frameUrl":"https://fsfe.org/news/2024/news-20241002-01.en.html","page":"https://fsfe.org/news/2024/news-20241002-01.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:43:33.218Z","logLevel":"info","context":"behavior","message":"Run Script Finished","details":{"frameUrl":"https://fsfe.org/news/2024/news-20241002-01.en.html","page":"https://fsfe.org/news/2024/news-20241002-01.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:43:33.219Z","logLevel":"info","context":"behavior","message":"Behaviors finished","details":{"finished":1,"page":"https://fsfe.org/news/2024/news-20241002-01.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:43:34.222Z","logLevel":"info","context":"pageStatus","message":"Page Finished","details":{"loadState":4,"page":"https://fsfe.org/news/2024/news-20241002-01.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:43:34.251Z","logLevel":"info","context":"worker","message":"Starting page","details":{"workerid":0,"page":"https://fsfe.org/news/2024/news-20240920-01.en.html"}}
{"timestamp":"2024-10-15T08:43:34.253Z","logLevel":"info","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":26,"total":686,"pending":1,"failed":0,"limit":{"max":0,"hit":false},"pendingPages":["{\"seedId\":0,\"started\":\"2024-10-15T08:43:34.250Z\",\"extraHops\":0,\"url\":\"https:\\/\\/fsfe.org\\/news\\/2024\\/news-20240920-01.en.html\",\"added\":\"2024-10-15T08:41:22.886Z\",\"depth\":1}"]}}
{"timestamp":"2024-10-15T08:43:34.371Z","logLevel":"info","context":"general","message":"Awaiting page load","details":{"page":"https://fsfe.org/news/2024/news-20240920-01.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:43:35.322Z","logLevel":"warn","context":"recorder","message":"Request failed","details":{"url":"https://download.fsfe.org/videos/peertube/opzZJm8SAYLQYz5gTXBeJ9_720p.mp4","errorText":"net::ERR_FAILED","page":"https://fsfe.org/news/2024/news-20240920-01.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:43:50.145Z","logLevel":"warn","context":"general","message":"Invalid Page - URL must start with http:// or https://","details":--------@fsfe.org","page":"https://fsfe.org/news/2024/news-20240920-01.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:43:50.394Z","logLevel":"info","context":"behavior","message":"Running behaviors","details":{"frames":1,"frameUrls":["https://fsfe.org/news/2024/news-20240920-01.en.html"],"page":"https://fsfe.org/news/2024/news-20240920-01.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:43:50.395Z","logLevel":"info","context":"behavior","message":"Run Script Started","details":{"frameUrl":"https://fsfe.org/news/2024/news-20240920-01.en.html","page":"https://fsfe.org/news/2024/news-20240920-01.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:43:50.606Z","logLevel":"warn","context":"recorder","message":"continueResponse failed","details":{"url":"https://download.fsfe.org/videos/peertube/1MiNgffbuVPSVipHDDBhJK_720p.mp4"}}
{"timestamp":"2024-10-15T08:44:25.010Z","logLevel":"error","context":"browser","message":"Browser disconnected (crashed?), interrupting crawl","details":{}}
{"timestamp":"2024-10-15T08:44:25.013Z","logLevel":"warn","context":"recorder","message":"Failed to load response body","details":{"url":"https://download.fsfe.org/videos/peertube/8N57qV4Q8saYmTSEH9JNym_720p.mp4","networkId":"386.148","type":"exception","message":"Protocol error (Fetch.getResponseBody): Target closed","stack":"TargetCloseError: Protocol error (Fetch.getResponseBody): Target closed\n    at CallbackRegistry.clear (file:///app/node_modules/puppeteer-core/lib/esm/puppeteer/common/CallbackRegistry.js:69:36)\n    at CdpCDPSession._onClosed (file:///app/node_modules/puppeteer-core/lib/esm/puppeteer/cdp/CDPSession.js:98:25)\n    at #onClose (file:///app/node_modules/puppeteer-core/lib/esm/puppeteer/cdp/Connection.js:163:21)\n    at WebSocket.<anonymous> (file:///app/node_modules/puppeteer-core/lib/esm/puppeteer/node/NodeWebSocketTransport.js:43:30)\n    at callListener (/app/node_modules/puppeteer-core/node_modules/ws/lib/event-target.js:290:14)\n    at WebSocket.onClose (/app/node_modules/puppeteer-core/node_modules/ws/lib/event-target.js:220:9)\n    at WebSocket.emit (node:events:519:28)\n    at WebSocket.emitClose (/app/node_modules/puppeteer-core/node_modules/ws/lib/websocket.js:272:10)\n    at Socket.socketOnClose (/app/node_modules/puppeteer-core/node_modules/ws/lib/websocket.js:1341:15)\n    at Socket.emit (node:events:519:28)","page":"https://fsfe.org/news/2024/news-20240920-01.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:44:25.014Z","logLevel":"warn","context":"recorder","message":"Failed to load response body","details":{"url":"https://download.fsfe.org/videos/peertube/ffUSqNGovBvWZwFq82knZH_720p.mp4","networkId":"386.150","type":"exception","message":"Protocol error (Fetch.getResponseBody): Target closed","stack":"TargetCloseError: Protocol error (Fetch.getResponseBody): Target closed\n    at CallbackRegistry.clear (file:///app/node_modules/puppeteer-core/lib/esm/puppeteer/common/CallbackRegistry.js:69:36)\n    at CdpCDPSession._onClosed (file:///app/node_modules/puppeteer-core/lib/esm/puppeteer/cdp/CDPSession.js:98:25)\n    at #onClose (file:///app/node_modules/puppeteer-core/lib/esm/puppeteer/cdp/Connection.js:163:21)\n    at WebSocket.<anonymous> (file:///app/node_modules/puppeteer-core/lib/esm/puppeteer/node/NodeWebSocketTransport.js:43:30)\n    at callListener (/app/node_modules/puppeteer-core/node_modules/ws/lib/event-target.js:290:14)\n    at WebSocket.onClose (/app/node_modules/puppeteer-core/node_modules/ws/lib/event-target.js:220:9)\n    at WebSocket.emit (node:events:519:28)\n    at WebSocket.emitClose (/app/node_modules/puppeteer-core/node_modules/ws/lib/websocket.js:272:10)\n    at Socket.socketOnClose (/app/node_modules/puppeteer-core/node_modules/ws/lib/websocket.js:1341:15)\n    at Socket.emit (node:events:519:28)","page":"https://fsfe.org/news/2024/news-20240920-01.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:44:25.055Z","logLevel":"warn","context":"behavior","message":"Behavior run partially failed","details":{"reason":{"type":"exception","message":"Protocol error (Runtime.evaluate): Target closed","stack":"TargetCloseError: Protocol error (Runtime.evaluate): Target closed\n    at CallbackRegistry.clear (file:///app/node_modules/puppeteer-core/lib/esm/puppeteer/common/CallbackRegistry.js:69:36)\n    at CdpCDPSession._onClosed (file:///app/node_modules/puppeteer-core/lib/esm/puppeteer/cdp/CDPSession.js:98:25)\n    at #onClose (file:///app/node_modules/puppeteer-core/lib/esm/puppeteer/cdp/Connection.js:163:21)\n    at WebSocket.<anonymous> (file:///app/node_modules/puppeteer-core/lib/esm/puppeteer/node/NodeWebSocketTransport.js:43:30)\n    at callListener (/app/node_modules/puppeteer-core/node_modules/ws/lib/event-target.js:290:14)\n    at WebSocket.onClose (/app/node_modules/puppeteer-core/node_modules/ws/lib/event-target.js:220:9)\n    at WebSocket.emit (node:events:519:28)\n    at WebSocket.emitClose (/app/node_modules/puppeteer-core/node_modules/ws/lib/websocket.js:272:10)\n    at Socket.socketOnClose (/app/node_modules/puppeteer-core/node_modules/ws/lib/websocket.js:1341:15)\n    at Socket.emit (node:events:519:28)"},"page":"https://fsfe.org/news/2024/news-20240920-01.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:44:25.055Z","logLevel":"info","context":"behavior","message":"Behaviors finished","details":{"finished":1,"page":"https://fsfe.org/news/2024/news-20240920-01.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:44:27.876Z","logLevel":"info","context":"pageStatus","message":"Page Finished","details":{"loadState":4,"page":"https://fsfe.org/news/2024/news-20240920-01.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:44:28.862Z","logLevel":"info","context":"worker","message":"Worker done, all tasks complete","details":{"workerid":0}}
{"timestamp":"2024-10-15T08:44:30.011Z","logLevel":"warn","context":"recorder","message":"Large payload written to WARC, but not returned to browser (would require rereading into memory)","details":{"url":"https://download.fsfe.org/videos/peertube/1MiNgffbuVPSVipHDDBhJK_720p.webm","actualSize":57664754,"maxSize":5000000}}
{"timestamp":"2024-10-15T08:44:33.410Z","logLevel":"warn","context":"recorder","message":"Large payload written to WARC, but not returned to browser (would require rereading into memory)","details":{"url":"https://download.fsfe.org/videos/peertube/1MiNgffbuVPSVipHDDBhJK_360p.mp4","actualSize":46969088,"maxSize":5000000}}
{"timestamp":"2024-10-15T08:44:38.121Z","logLevel":"warn","context":"recorder","message":"Async fetch: possible response size mismatch","details":{"size":67108864,"expected":67141925,"url":"https://download.fsfe.org/videos/peertube/8vznSsHk6Brh9dD3s9HoK5_720p.mp4","page":"https://fsfe.org/news/2024/news-20240920-01.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:44:38.122Z","logLevel":"warn","context":"recorder","message":"Large payload written to WARC, but not returned to browser (would require rereading into memory)","details":{"url":"https://download.fsfe.org/videos/peertube/8vznSsHk6Brh9dD3s9HoK5_720p.mp4","actualSize":67108864,"maxSize":5000000}}
{"timestamp":"2024-10-15T08:44:40.997Z","logLevel":"warn","context":"recorder","message":"Large payload written to WARC, but not returned to browser (would require rereading into memory)","details":{"url":"https://download.fsfe.org/videos/peertube/1MiNgffbuVPSVipHDDBhJK_360p.webm","actualSize":34404976,"maxSize":5000000}}
{"timestamp":"2024-10-15T08:44:51.531Z","logLevel":"warn","context":"recorder","message":"Large payload written to WARC, but not returned to browser (would require rereading into memory)","details":{"url":"https://download.fsfe.org/videos/peertube/8vznSsHk6Brh9dD3s9HoK5_1080p.mp4","actualSize":146675884,"maxSize":5000000}}
{"timestamp":"2024-10-15T08:44:53.232Z","logLevel":"warn","context":"recorder","message":"Large payload written to WARC, but not returned to browser (would require rereading into memory)","details":{"url":"https://download.fsfe.org/videos/peertube/8vznSsHk6Brh9dD3s9HoK5_360p.mp4","actualSize":23893680,"maxSize":5000000}}
{"timestamp":"2024-10-15T08:45:00.525Z","logLevel":"warn","context":"recorder","message":"Large payload written to WARC, but not returned to browser (would require rereading into memory)","details":{"url":"https://download.fsfe.org/videos/peertube/8vznSsHk6Brh9dD3s9HoK5_1080p.webm","actualSize":104749623,"maxSize":5000000}}
{"timestamp":"2024-10-15T08:45:03.978Z","logLevel":"warn","context":"recorder","message":"Large payload written to WARC, but not returned to browser (would require rereading into memory)","details":{"url":"https://download.fsfe.org/videos/peertube/8vznSsHk6Brh9dD3s9HoK5_720p.webm","actualSize":48514712,"maxSize":5000000}}
{"timestamp":"2024-10-15T08:45:05.171Z","logLevel":"warn","context":"recorder","message":"Large payload written to WARC, but not returned to browser (would require rereading into memory)","details":{"url":"https://download.fsfe.org/videos/peertube/8vznSsHk6Brh9dD3s9HoK5_360p.webm","actualSize":16592285,"maxSize":5000000}}
{"timestamp":"2024-10-15T08:45:05.281Z","logLevel":"warn","context":"recorder","message":"Async fetch: possible response size mismatch","details":{"size":1245184,"expected":84452595,"url":"https://download.fsfe.org/videos/peertube/1MiNgffbuVPSVipHDDBhJK_720p.mp4","page":"https://fsfe.org/news/2024/news-20240920-01.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:45:05.747Z","logLevel":"warn","context":"recorder","message":"Large payload written to WARC, but not returned to browser (would require rereading into memory)","details":{"url":"https://download.fsfe.org/videos/peertube/gAbtkoFWaNNoCmDuyoJ2KC_1080p.mp4","actualSize":5168103,"maxSize":5000000}}
{"timestamp":"2024-10-15T08:45:53.013Z","logLevel":"warn","context":"recorder","message":"Large payload written to WARC, but not returned to browser (would require rereading into memory)","details":{"url":"https://download.fsfe.org/videos/peertube/8N57qV4Q8saYmTSEH9JNym_1080p.mp4","actualSize":675001779,"maxSize":5000000}}
{"timestamp":"2024-10-15T08:46:07.254Z","logLevel":"warn","context":"recorder","message":"Large payload written to WARC, but not returned to browser (would require rereading into memory)","details":{"url":"https://download.fsfe.org/videos/peertube/8N57qV4Q8saYmTSEH9JNym_360p.mp4","actualSize":202044622,"maxSize":5000000}}
{"timestamp":"2024-10-15T08:48:20.747Z","logLevel":"info","context":"writer","message":"Rollover size exceeded, creating new WARC","details":{"size":1483211468,"oldFilename":"rec-5cc801ced721-20241015084120601-0.warc.gz","newFilename":"rec-5cc801ced721-20241015084820746-0.warc.gz","rolloverSize":1000000000,"id":"0"}}
{"timestamp":"2024-10-15T08:48:44.878Z","logLevel":"info","context":"general","message":"Saving crawl state to: /output/.tmp3q8rzu7v/collections/crawl-20241015084116141/crawls/crawl-20241015084844-5cc801ced721.yaml","details":{}}
{"timestamp":"2024-10-15T08:48:44.884Z","logLevel":"info","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":27,"total":690,"pending":0,"failed":0,"limit":{"max":0,"hit":false},"pendingPages":[]}}
{"timestamp":"2024-10-15T08:48:44.894Z","logLevel":"info","context":"general","message":"Crawling done","details":{}}
{"timestamp":"2024-10-15T08:48:44.897Z","logLevel":"info","context":"general","message":"Exiting, Crawl status: interrupted","details":{}}
[zimit::2024-10-15 08:48:45,055] INFO:
[zimit::2024-10-15 08:48:45,055] INFO:
[zimit::2024-10-15 08:48:45,056] INFO:SIGINT/SIGTERM received, stopping zimit
[zimit::2024-10-15 08:48:45,056] INFO:
[zimit::2024-10-15 08:48:45,056] INFO:
ikreymer commented 6 days ago

Hmm, some of the other messages are just warnings - seems like it's encountering a bunch of large files, which are not loaded in the browser (as expected), and the WARC is rolled over. That should all be ok, but the browser crash is what's causing the interrupt..

ikreymer commented 5 days ago

If you load that particular page in Chrome, it appears to be infinitely loading the video content due to some bug in the player (presumably it was tested more in FF then Chrome). Here's what my devtools looks like on: https://fsfe.org/news/2024/news-20240920-01.en.html:

Screenshot 2024-10-25 at 6 00 29 PM

Since this is all going through the crawler (though it's not saving these partial range requests), I'm not too surprised that it causes the browser to crash eventually... Can see if there's a way we can ignore these from even being tried, but it's definitely an issue with this site...

rgaudin commented 4 days ago

Indeed I get the same results on Chrome here. The player seems indeed buggy. FF doesn't work either but for different reasons: there's no autoplay there and most videos dont start when clicked.

How's the code handling this? Is this firing a direct download request for each of those attempts we see here?

I bet @benoit74 will have new use cases tomorrow and will maybe be able to share another link exhibiting the issue.

ikreymer commented 4 days ago

How's the code handling this? Is this firing a direct download request for each of those attempts we see here?

No, it shouldn't be, should already be ignoring these, but made some more optimizations / clean-up. Some videos were being skipped for other reasons, but possible the repeated requests could result in a browser crash (though I haven't reproed that) Try this branch: https://github.com/webrecorder/browsertrix-crawler/tree/range-load-optimizations

benoit74 commented 2 days ago

New occurence last week (we are not responsible for the content our users are trying to ZIM, not sure they are all very aligned with our mission, didn't checked tbh):

I will probably test #709 only once released, unless you need help to test this before merge, pretty busy with other topics atm and testing a branch is not that straightforward on my end ^^ Thank you for these enhancements anyway

ikreymer commented 12 hours ago

Found a major issue, it appears there was a status code check and only 200 responses were being streamed, but all the videos are 206, and that was excluded from streaming 🤦 . This likely resulted in the browser crash since it tried to load the whole thing into memory 🤦 . Will be in the next fix!