Browser disconnected (crashed?)

rgaudin commented 1 week ago

This week, with Browsertrix-Crawler 1.3.3 (with warcio.js 2.3.1), I am getting several cases of the following:

a website is being crawled
there are videos
behaviors are ran
Browser disconnected (crashed?), interrupting crawl log
Then Failed to load response body
Then Large payload written to WARC, but not returned to browser (would require rereading into memory)
Then Rollover size exceeded, creating new WAR
limit not hit is logged: {"crawled":27,"total":690,"pending":0,"failed":0,"limit":{"max":0,"hit":false}
crawler exits as interrupted Exiting, Crawl status: interrupted

I don't know of those connect to each other but This happened on multiple different websites and it happens consistently.

You can try with https://fsfe.org/

{"timestamp":"2024-10-15T08:43:10.788Z","logLevel":"warn","context":"recorder","message":"continueResponse failed","details":{"url":"https://download.fsfe.org/videos/peertube/xs29yhLxSP1uKLYkSeoKKp_720p.mp4"}}
{"timestamp":"2024-10-15T08:43:10.806Z","logLevel":"warn","context":"recorder","message":"continueResponse failed","details":{"url":"https://download.fsfe.org/videos/peertube/xs29yhLxSP1uKLYkSeoKKp_720p.mp4"}}
{"timestamp":"2024-10-15T08:43:10.823Z","logLevel":"warn","context":"recorder","message":"continueResponse failed","details":{"url":"https://download.fsfe.org/videos/peertube/xs29yhLxSP1uKLYkSeoKKp_720p.mp4"}}
{"timestamp":"2024-10-15T08:43:10.844Z","logLevel":"warn","context":"recorder","message":"continueResponse failed","details":{"url":"https://download.fsfe.org/videos/peertube/xs29yhLxSP1uKLYkSeoKKp_720p.mp4"}}
{"timestamp":"2024-10-15T08:43:10.869Z","logLevel":"warn","context":"recorder","message":"continueResponse failed","details":{"url":"https://download.fsfe.org/videos/peertube/xs29yhLxSP1uKLYkSeoKKp_720p.mp4"}}
{"timestamp":"2024-10-15T08:43:10.944Z","logLevel":"warn","context":"recorder","message":"continueResponse failed","details":{"url":"https://download.fsfe.org/videos/peertube/xs29yhLxSP1uKLYkSeoKKp_720p.mp4"}}
{"timestamp":"2024-10-15T08:43:12.199Z","logLevel":"info","context":"behavior","message":"Running behaviors","details":{"frames":1,"frameUrls":["https://fsfe.org/freesoftware/index.en.html"],"page":"https://fsfe.org/freesoftware/index.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:43:12.199Z","logLevel":"info","context":"behavior","message":"Run Script Started","details":{"frameUrl":"https://fsfe.org/freesoftware/index.en.html","page":"https://fsfe.org/freesoftware/index.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:43:12.971Z","logLevel":"info","context":"behavior","message":"Run Script Finished","details":{"frameUrl":"https://fsfe.org/freesoftware/index.en.html","page":"https://fsfe.org/freesoftware/index.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:43:12.972Z","logLevel":"info","context":"behavior","message":"Behaviors finished","details":{"finished":1,"page":"https://fsfe.org/freesoftware/index.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:43:13.972Z","logLevel":"info","context":"pageStatus","message":"Page Finished","details":{"loadState":4,"page":"https://fsfe.org/freesoftware/index.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:43:14.057Z","logLevel":"info","context":"worker","message":"Starting page","details":{"workerid":0,"page":"https://fsfe.org/news/nl/nl-202410.en.html"}}
{"timestamp":"2024-10-15T08:43:14.066Z","logLevel":"info","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":21,"total":682,"pending":1,"failed":0,"limit":{"max":0,"hit":false},"pendingPages":["{\"seedId\":0,\"started\":\"2024-10-15T08:43:14.043Z\",\"extraHops\":0,\"url\":\"https:\\/\\/fsfe.org\\/news\\/nl\\/nl-202410.en.html\",\"added\":\"2024-10-15T08:41:22.878Z\",\"depth\":1}"]}}
{"timestamp":"2024-10-15T08:43:14.316Z","logLevel":"info","context":"general","message":"Awaiting page load","details":{"page":"https://fsfe.org/news/nl/nl-202410.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:43:14.490Z","logLevel":"warn","context":"recorder","message":"continueResponse failed","details":{"url":"https://download.fsfe.org/videos/peertube/xs29yhLxSP1uKLYkSeoKKp_1080p.webm"}}
{"timestamp":"2024-10-15T08:43:16.201Z","logLevel":"info","context":"behavior","message":"Running behaviors","details":{"frames":1,"frameUrls":["https://fsfe.org/news/nl/nl-202410.en.html"],"page":"https://fsfe.org/news/nl/nl-202410.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:43:16.201Z","logLevel":"info","context":"behavior","message":"Run Script Started","details":{"frameUrl":"https://fsfe.org/news/nl/nl-202410.en.html","page":"https://fsfe.org/news/nl/nl-202410.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:43:16.742Z","logLevel":"info","context":"behavior","message":"Run Script Finished","details":{"frameUrl":"https://fsfe.org/news/nl/nl-202410.en.html","page":"https://fsfe.org/news/nl/nl-202410.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:43:16.742Z","logLevel":"info","context":"behavior","message":"Behaviors finished","details":{"finished":1,"page":"https://fsfe.org/news/nl/nl-202410.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:43:17.749Z","logLevel":"info","context":"pageStatus","message":"Page Finished","details":{"loadState":4,"page":"https://fsfe.org/news/nl/nl-202410.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:43:17.824Z","logLevel":"info","context":"worker","message":"Starting page","details":{"workerid":0,"page":"https://fsfe.org/news/2024/news-20240911-01.en.html"}}
{"timestamp":"2024-10-15T08:43:17.828Z","logLevel":"info","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":22,"total":686,"pending":1,"failed":0,"limit":{"max":0,"hit":false},"pendingPages":["{\"seedId\":0,\"started\":\"2024-10-15T08:43:17.821Z\",\"extraHops\":0,\"url\":\"https:\\/\\/fsfe.org\\/news\\/2024\\/news-20240911-01.en.html\",\"added\":\"2024-10-15T08:41:22.880Z\",\"depth\":1}"]}}
{"timestamp":"2024-10-15T08:43:17.944Z","logLevel":"info","context":"general","message":"Awaiting page load","details":{"page":"https://fsfe.org/news/2024/news-20240911-01.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:43:19.407Z","logLevel":"info","context":"behavior","message":"Running behaviors","details":{"frames":1,"frameUrls":["https://fsfe.org/news/2024/news-20240911-01.en.html"],"page":"https://fsfe.org/news/2024/news-20240911-01.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:43:19.407Z","logLevel":"info","context":"behavior","message":"Run Script Started","details":{"frameUrl":"https://fsfe.org/news/2024/news-20240911-01.en.html","page":"https://fsfe.org/news/2024/news-20240911-01.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:43:19.962Z","logLevel":"info","context":"behavior","message":"Run Script Finished","details":{"frameUrl":"https://fsfe.org/news/2024/news-20240911-01.en.html","page":"https://fsfe.org/news/2024/news-20240911-01.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:43:19.963Z","logLevel":"info","context":"behavior","message":"Behaviors finished","details":{"finished":1,"page":"https://fsfe.org/news/2024/news-20240911-01.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:43:20.967Z","logLevel":"info","context":"pageStatus","message":"Page Finished","details":{"loadState":4,"page":"https://fsfe.org/news/2024/news-20240911-01.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:43:20.996Z","logLevel":"info","context":"worker","message":"Starting page","details":{"workerid":0,"page":"https://fsfe.org/news/2024/news-20240812-01.en.html"}}
{"timestamp":"2024-10-15T08:43:20.998Z","logLevel":"info","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":23,"total":686,"pending":1,"failed":0,"limit":{"max":0,"hit":false},"pendingPages":["{\"seedId\":0,\"started\":\"2024-10-15T08:43:20.994Z\",\"extraHops\":0,\"url\":\"https:\\/\\/fsfe.org\\/news\\/2024\\/news-20240812-01.en.html\",\"added\":\"2024-10-15T08:41:22.882Z\",\"depth\":1}"]}}
{"timestamp":"2024-10-15T08:43:21.030Z","logLevel":"info","context":"general","message":"Awaiting page load","details":{"page":"https://fsfe.org/news/2024/news-20240812-01.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:43:22.410Z","logLevel":"info","context":"behavior","message":"Running behaviors","details":{"frames":1,"frameUrls":["https://fsfe.org/news/2024/news-20240812-01.en.html"],"page":"https://fsfe.org/news/2024/news-20240812-01.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:43:22.411Z","logLevel":"info","context":"behavior","message":"Run Script Started","details":{"frameUrl":"https://fsfe.org/news/2024/news-20240812-01.en.html","page":"https://fsfe.org/news/2024/news-20240812-01.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:43:22.950Z","logLevel":"info","context":"behavior","message":"Run Script Finished","details":{"frameUrl":"https://fsfe.org/news/2024/news-20240812-01.en.html","page":"https://fsfe.org/news/2024/news-20240812-01.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:43:22.951Z","logLevel":"info","context":"behavior","message":"Behaviors finished","details":{"finished":1,"page":"https://fsfe.org/news/2024/news-20240812-01.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:43:23.952Z","logLevel":"info","context":"pageStatus","message":"Page Finished","details":{"loadState":4,"page":"https://fsfe.org/news/2024/news-20240812-01.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:43:23.978Z","logLevel":"info","context":"worker","message":"Starting page","details":{"workerid":0,"page":"https://fsfe.org/news/index.en.html"}}
{"timestamp":"2024-10-15T08:43:23.980Z","logLevel":"info","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":24,"total":686,"pending":1,"failed":0,"limit":{"max":0,"hit":false},"pendingPages":["{\"seedId\":0,\"started\":\"2024-10-15T08:43:23.977Z\",\"extraHops\":0,\"url\":\"https:\\/\\/fsfe.org\\/news\\/index.en.html\",\"added\":\"2024-10-15T08:41:22.882Z\",\"depth\":1}"]}}
{"timestamp":"2024-10-15T08:43:24.025Z","logLevel":"info","context":"general","message":"Awaiting page load","details":{"page":"https://fsfe.org/news/index.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:43:28.253Z","logLevel":"info","context":"behavior","message":"Running behaviors","details":{"frames":1,"frameUrls":["https://fsfe.org/news/index.en.html"],"page":"https://fsfe.org/news/index.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:43:28.253Z","logLevel":"info","context":"behavior","message":"Run Script Started","details":{"frameUrl":"https://fsfe.org/news/index.en.html","page":"https://fsfe.org/news/index.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:43:28.883Z","logLevel":"info","context":"behavior","message":"Run Script Finished","details":{"frameUrl":"https://fsfe.org/news/index.en.html","page":"https://fsfe.org/news/index.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:43:28.883Z","logLevel":"info","context":"behavior","message":"Behaviors finished","details":{"finished":1,"page":"https://fsfe.org/news/index.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:43:29.887Z","logLevel":"info","context":"pageStatus","message":"Page Finished","details":{"loadState":4,"page":"https://fsfe.org/news/index.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:43:30.622Z","logLevel":"info","context":"worker","message":"Starting page","details":{"workerid":0,"page":"https://fsfe.org/news/2024/news-20241002-01.en.html"}}
{"timestamp":"2024-10-15T08:43:30.624Z","logLevel":"info","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":25,"total":686,"pending":1,"failed":0,"limit":{"max":0,"hit":false},"pendingPages":["{\"seedId\":0,\"started\":\"2024-10-15T08:43:29.915Z\",\"extraHops\":0,\"url\":\"https:\\/\\/fsfe.org\\/news\\/2024\\/news-20241002-01.en.html\",\"added\":\"2024-10-15T08:41:22.884Z\",\"depth\":1}"]}}
{"timestamp":"2024-10-15T08:43:30.751Z","logLevel":"info","context":"general","message":"Awaiting page load","details":{"page":"https://fsfe.org/news/2024/news-20241002-01.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:43:32.667Z","logLevel":"info","context":"behavior","message":"Running behaviors","details":{"frames":1,"frameUrls":["https://fsfe.org/news/2024/news-20241002-01.en.html"],"page":"https://fsfe.org/news/2024/news-20241002-01.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:43:32.668Z","logLevel":"info","context":"behavior","message":"Run Script Started","details":{"frameUrl":"https://fsfe.org/news/2024/news-20241002-01.en.html","page":"https://fsfe.org/news/2024/news-20241002-01.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:43:33.218Z","logLevel":"info","context":"behavior","message":"Run Script Finished","details":{"frameUrl":"https://fsfe.org/news/2024/news-20241002-01.en.html","page":"https://fsfe.org/news/2024/news-20241002-01.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:43:33.219Z","logLevel":"info","context":"behavior","message":"Behaviors finished","details":{"finished":1,"page":"https://fsfe.org/news/2024/news-20241002-01.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:43:34.222Z","logLevel":"info","context":"pageStatus","message":"Page Finished","details":{"loadState":4,"page":"https://fsfe.org/news/2024/news-20241002-01.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:43:34.251Z","logLevel":"info","context":"worker","message":"Starting page","details":{"workerid":0,"page":"https://fsfe.org/news/2024/news-20240920-01.en.html"}}
{"timestamp":"2024-10-15T08:43:34.253Z","logLevel":"info","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":26,"total":686,"pending":1,"failed":0,"limit":{"max":0,"hit":false},"pendingPages":["{\"seedId\":0,\"started\":\"2024-10-15T08:43:34.250Z\",\"extraHops\":0,\"url\":\"https:\\/\\/fsfe.org\\/news\\/2024\\/news-20240920-01.en.html\",\"added\":\"2024-10-15T08:41:22.886Z\",\"depth\":1}"]}}
{"timestamp":"2024-10-15T08:43:34.371Z","logLevel":"info","context":"general","message":"Awaiting page load","details":{"page":"https://fsfe.org/news/2024/news-20240920-01.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:43:35.322Z","logLevel":"warn","context":"recorder","message":"Request failed","details":{"url":"https://download.fsfe.org/videos/peertube/opzZJm8SAYLQYz5gTXBeJ9_720p.mp4","errorText":"net::ERR_FAILED","page":"https://fsfe.org/news/2024/news-20240920-01.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:43:50.145Z","logLevel":"warn","context":"general","message":"Invalid Page - URL must start with http:// or https://","details":--------@fsfe.org","page":"https://fsfe.org/news/2024/news-20240920-01.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:43:50.394Z","logLevel":"info","context":"behavior","message":"Running behaviors","details":{"frames":1,"frameUrls":["https://fsfe.org/news/2024/news-20240920-01.en.html"],"page":"https://fsfe.org/news/2024/news-20240920-01.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:43:50.395Z","logLevel":"info","context":"behavior","message":"Run Script Started","details":{"frameUrl":"https://fsfe.org/news/2024/news-20240920-01.en.html","page":"https://fsfe.org/news/2024/news-20240920-01.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:43:50.606Z","logLevel":"warn","context":"recorder","message":"continueResponse failed","details":{"url":"https://download.fsfe.org/videos/peertube/1MiNgffbuVPSVipHDDBhJK_720p.mp4"}}
{"timestamp":"2024-10-15T08:44:25.010Z","logLevel":"error","context":"browser","message":"Browser disconnected (crashed?), interrupting crawl","details":{}}
{"timestamp":"2024-10-15T08:44:25.013Z","logLevel":"warn","context":"recorder","message":"Failed to load response body","details":{"url":"https://download.fsfe.org/videos/peertube/8N57qV4Q8saYmTSEH9JNym_720p.mp4","networkId":"386.148","type":"exception","message":"Protocol error (Fetch.getResponseBody): Target closed","stack":"TargetCloseError: Protocol error (Fetch.getResponseBody): Target closed\n    at CallbackRegistry.clear (file:///app/node_modules/puppeteer-core/lib/esm/puppeteer/common/CallbackRegistry.js:69:36)\n    at CdpCDPSession._onClosed (file:///app/node_modules/puppeteer-core/lib/esm/puppeteer/cdp/CDPSession.js:98:25)\n    at #onClose (file:///app/node_modules/puppeteer-core/lib/esm/puppeteer/cdp/Connection.js:163:21)\n    at WebSocket.<anonymous> (file:///app/node_modules/puppeteer-core/lib/esm/puppeteer/node/NodeWebSocketTransport.js:43:30)\n    at callListener (/app/node_modules/puppeteer-core/node_modules/ws/lib/event-target.js:290:14)\n    at WebSocket.onClose (/app/node_modules/puppeteer-core/node_modules/ws/lib/event-target.js:220:9)\n    at WebSocket.emit (node:events:519:28)\n    at WebSocket.emitClose (/app/node_modules/puppeteer-core/node_modules/ws/lib/websocket.js:272:10)\n    at Socket.socketOnClose (/app/node_modules/puppeteer-core/node_modules/ws/lib/websocket.js:1341:15)\n    at Socket.emit (node:events:519:28)","page":"https://fsfe.org/news/2024/news-20240920-01.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:44:25.014Z","logLevel":"warn","context":"recorder","message":"Failed to load response body","details":{"url":"https://download.fsfe.org/videos/peertube/ffUSqNGovBvWZwFq82knZH_720p.mp4","networkId":"386.150","type":"exception","message":"Protocol error (Fetch.getResponseBody): Target closed","stack":"TargetCloseError: Protocol error (Fetch.getResponseBody): Target closed\n    at CallbackRegistry.clear (file:///app/node_modules/puppeteer-core/lib/esm/puppeteer/common/CallbackRegistry.js:69:36)\n    at CdpCDPSession._onClosed (file:///app/node_modules/puppeteer-core/lib/esm/puppeteer/cdp/CDPSession.js:98:25)\n    at #onClose (file:///app/node_modules/puppeteer-core/lib/esm/puppeteer/cdp/Connection.js:163:21)\n    at WebSocket.<anonymous> (file:///app/node_modules/puppeteer-core/lib/esm/puppeteer/node/NodeWebSocketTransport.js:43:30)\n    at callListener (/app/node_modules/puppeteer-core/node_modules/ws/lib/event-target.js:290:14)\n    at WebSocket.onClose (/app/node_modules/puppeteer-core/node_modules/ws/lib/event-target.js:220:9)\n    at WebSocket.emit (node:events:519:28)\n    at WebSocket.emitClose (/app/node_modules/puppeteer-core/node_modules/ws/lib/websocket.js:272:10)\n    at Socket.socketOnClose (/app/node_modules/puppeteer-core/node_modules/ws/lib/websocket.js:1341:15)\n    at Socket.emit (node:events:519:28)","page":"https://fsfe.org/news/2024/news-20240920-01.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:44:25.055Z","logLevel":"warn","context":"behavior","message":"Behavior run partially failed","details":{"reason":{"type":"exception","message":"Protocol error (Runtime.evaluate): Target closed","stack":"TargetCloseError: Protocol error (Runtime.evaluate): Target closed\n    at CallbackRegistry.clear (file:///app/node_modules/puppeteer-core/lib/esm/puppeteer/common/CallbackRegistry.js:69:36)\n    at CdpCDPSession._onClosed (file:///app/node_modules/puppeteer-core/lib/esm/puppeteer/cdp/CDPSession.js:98:25)\n    at #onClose (file:///app/node_modules/puppeteer-core/lib/esm/puppeteer/cdp/Connection.js:163:21)\n    at WebSocket.<anonymous> (file:///app/node_modules/puppeteer-core/lib/esm/puppeteer/node/NodeWebSocketTransport.js:43:30)\n    at callListener (/app/node_modules/puppeteer-core/node_modules/ws/lib/event-target.js:290:14)\n    at WebSocket.onClose (/app/node_modules/puppeteer-core/node_modules/ws/lib/event-target.js:220:9)\n    at WebSocket.emit (node:events:519:28)\n    at WebSocket.emitClose (/app/node_modules/puppeteer-core/node_modules/ws/lib/websocket.js:272:10)\n    at Socket.socketOnClose (/app/node_modules/puppeteer-core/node_modules/ws/lib/websocket.js:1341:15)\n    at Socket.emit (node:events:519:28)"},"page":"https://fsfe.org/news/2024/news-20240920-01.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:44:25.055Z","logLevel":"info","context":"behavior","message":"Behaviors finished","details":{"finished":1,"page":"https://fsfe.org/news/2024/news-20240920-01.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:44:27.876Z","logLevel":"info","context":"pageStatus","message":"Page Finished","details":{"loadState":4,"page":"https://fsfe.org/news/2024/news-20240920-01.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:44:28.862Z","logLevel":"info","context":"worker","message":"Worker done, all tasks complete","details":{"workerid":0}}
{"timestamp":"2024-10-15T08:44:30.011Z","logLevel":"warn","context":"recorder","message":"Large payload written to WARC, but not returned to browser (would require rereading into memory)","details":{"url":"https://download.fsfe.org/videos/peertube/1MiNgffbuVPSVipHDDBhJK_720p.webm","actualSize":57664754,"maxSize":5000000}}
{"timestamp":"2024-10-15T08:44:33.410Z","logLevel":"warn","context":"recorder","message":"Large payload written to WARC, but not returned to browser (would require rereading into memory)","details":{"url":"https://download.fsfe.org/videos/peertube/1MiNgffbuVPSVipHDDBhJK_360p.mp4","actualSize":46969088,"maxSize":5000000}}
{"timestamp":"2024-10-15T08:44:38.121Z","logLevel":"warn","context":"recorder","message":"Async fetch: possible response size mismatch","details":{"size":67108864,"expected":67141925,"url":"https://download.fsfe.org/videos/peertube/8vznSsHk6Brh9dD3s9HoK5_720p.mp4","page":"https://fsfe.org/news/2024/news-20240920-01.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:44:38.122Z","logLevel":"warn","context":"recorder","message":"Large payload written to WARC, but not returned to browser (would require rereading into memory)","details":{"url":"https://download.fsfe.org/videos/peertube/8vznSsHk6Brh9dD3s9HoK5_720p.mp4","actualSize":67108864,"maxSize":5000000}}
{"timestamp":"2024-10-15T08:44:40.997Z","logLevel":"warn","context":"recorder","message":"Large payload written to WARC, but not returned to browser (would require rereading into memory)","details":{"url":"https://download.fsfe.org/videos/peertube/1MiNgffbuVPSVipHDDBhJK_360p.webm","actualSize":34404976,"maxSize":5000000}}
{"timestamp":"2024-10-15T08:44:51.531Z","logLevel":"warn","context":"recorder","message":"Large payload written to WARC, but not returned to browser (would require rereading into memory)","details":{"url":"https://download.fsfe.org/videos/peertube/8vznSsHk6Brh9dD3s9HoK5_1080p.mp4","actualSize":146675884,"maxSize":5000000}}
{"timestamp":"2024-10-15T08:44:53.232Z","logLevel":"warn","context":"recorder","message":"Large payload written to WARC, but not returned to browser (would require rereading into memory)","details":{"url":"https://download.fsfe.org/videos/peertube/8vznSsHk6Brh9dD3s9HoK5_360p.mp4","actualSize":23893680,"maxSize":5000000}}
{"timestamp":"2024-10-15T08:45:00.525Z","logLevel":"warn","context":"recorder","message":"Large payload written to WARC, but not returned to browser (would require rereading into memory)","details":{"url":"https://download.fsfe.org/videos/peertube/8vznSsHk6Brh9dD3s9HoK5_1080p.webm","actualSize":104749623,"maxSize":5000000}}
{"timestamp":"2024-10-15T08:45:03.978Z","logLevel":"warn","context":"recorder","message":"Large payload written to WARC, but not returned to browser (would require rereading into memory)","details":{"url":"https://download.fsfe.org/videos/peertube/8vznSsHk6Brh9dD3s9HoK5_720p.webm","actualSize":48514712,"maxSize":5000000}}
{"timestamp":"2024-10-15T08:45:05.171Z","logLevel":"warn","context":"recorder","message":"Large payload written to WARC, but not returned to browser (would require rereading into memory)","details":{"url":"https://download.fsfe.org/videos/peertube/8vznSsHk6Brh9dD3s9HoK5_360p.webm","actualSize":16592285,"maxSize":5000000}}
{"timestamp":"2024-10-15T08:45:05.281Z","logLevel":"warn","context":"recorder","message":"Async fetch: possible response size mismatch","details":{"size":1245184,"expected":84452595,"url":"https://download.fsfe.org/videos/peertube/1MiNgffbuVPSVipHDDBhJK_720p.mp4","page":"https://fsfe.org/news/2024/news-20240920-01.en.html","workerid":0}}
{"timestamp":"2024-10-15T08:45:05.747Z","logLevel":"warn","context":"recorder","message":"Large payload written to WARC, but not returned to browser (would require rereading into memory)","details":{"url":"https://download.fsfe.org/videos/peertube/gAbtkoFWaNNoCmDuyoJ2KC_1080p.mp4","actualSize":5168103,"maxSize":5000000}}
{"timestamp":"2024-10-15T08:45:53.013Z","logLevel":"warn","context":"recorder","message":"Large payload written to WARC, but not returned to browser (would require rereading into memory)","details":{"url":"https://download.fsfe.org/videos/peertube/8N57qV4Q8saYmTSEH9JNym_1080p.mp4","actualSize":675001779,"maxSize":5000000}}
{"timestamp":"2024-10-15T08:46:07.254Z","logLevel":"warn","context":"recorder","message":"Large payload written to WARC, but not returned to browser (would require rereading into memory)","details":{"url":"https://download.fsfe.org/videos/peertube/8N57qV4Q8saYmTSEH9JNym_360p.mp4","actualSize":202044622,"maxSize":5000000}}
{"timestamp":"2024-10-15T08:48:20.747Z","logLevel":"info","context":"writer","message":"Rollover size exceeded, creating new WARC","details":{"size":1483211468,"oldFilename":"rec-5cc801ced721-20241015084120601-0.warc.gz","newFilename":"rec-5cc801ced721-20241015084820746-0.warc.gz","rolloverSize":1000000000,"id":"0"}}
{"timestamp":"2024-10-15T08:48:44.878Z","logLevel":"info","context":"general","message":"Saving crawl state to: /output/.tmp3q8rzu7v/collections/crawl-20241015084116141/crawls/crawl-20241015084844-5cc801ced721.yaml","details":{}}
{"timestamp":"2024-10-15T08:48:44.884Z","logLevel":"info","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":27,"total":690,"pending":0,"failed":0,"limit":{"max":0,"hit":false},"pendingPages":[]}}
{"timestamp":"2024-10-15T08:48:44.894Z","logLevel":"info","context":"general","message":"Crawling done","details":{}}
{"timestamp":"2024-10-15T08:48:44.897Z","logLevel":"info","context":"general","message":"Exiting, Crawl status: interrupted","details":{}}
[zimit::2024-10-15 08:48:45,055] INFO:
[zimit::2024-10-15 08:48:45,055] INFO:
[zimit::2024-10-15 08:48:45,056] INFO:SIGINT/SIGTERM received, stopping zimit
[zimit::2024-10-15 08:48:45,056] INFO:
[zimit::2024-10-15 08:48:45,056] INFO:

ikreymer commented 6 days ago

Hmm, some of the other messages are just warnings - seems like it's encountering a bunch of large files, which are not loaded in the browser (as expected), and the WARC is rolled over. That should all be ok, but the browser crash is what's causing the interrupt..

ikreymer commented 5 days ago

If you load that particular page in Chrome, it appears to be infinitely loading the video content due to some bug in the player (presumably it was tested more in FF then Chrome). Here's what my devtools looks like on: https://fsfe.org/news/2024/news-20240920-01.en.html:

Since this is all going through the crawler (though it's not saving these partial range requests), I'm not too surprised that it causes the browser to crash eventually... Can see if there's a way we can ignore these from even being tried, but it's definitely an issue with this site...

rgaudin commented 4 days ago

Indeed I get the same results on Chrome here. The player seems indeed buggy. FF doesn't work either but for different reasons: there's no autoplay there and most videos dont start when clicked.

How's the code handling this? Is this firing a direct download request for each of those attempts we see here?

I bet @benoit74 will have new use cases tomorrow and will maybe be able to share another link exhibiting the issue.

ikreymer commented 4 days ago

How's the code handling this? Is this firing a direct download request for each of those attempts we see here?

No, it shouldn't be, should already be ignoring these, but made some more optimizations / clean-up. Some videos were being skipped for other reasons, but possible the repeated requests could result in a browser crash (though I haven't reproed that) Try this branch: https://github.com/webrecorder/browsertrix-crawler/tree/range-load-optimizations

benoit74 commented 2 days ago

New occurence last week (we are not responsible for the content our users are trying to ZIM, not sure they are all very aligned with our mission, didn't checked tbh):

a crawl of https://accords-library.com which crashed multiple times (user retried many times ^^) on https://accords-library.com/contents/pv-cm-collection-dod-promotional-video page
- video seems very hard / slow to load, server seems to have difficulty to respond
a crawl of https://defendinginerrancy.com which crashed on https://defendinginerrancy.com/f-david-farnell
- this page contains only 3 youtube videos, nothing very fancy at first sight
a crawl of https://youtube.fandom.com which crashed on https://youtube.fandom.com/wiki/CaseOh
- here it looks like we have a player which is indefinitely playing videos, so quite normal the crawler finally crashes, not sure what can be done here
a crawl of https://www.extremetech.com/ which crashed on https://www.extremetech.com/aerospace/ulas-vulcan-rocket-aces-second-flight-despite-engine-anomaly
- again a player which loops videos forever
a crawl of https://www.jeffgeerling.com/ which crashed on https://www.jeffgeerling.com/blog/2020/ansible-101-jeff-geerling-youtube-streaming-series
- lots of Youtube videos, nothing very unexpected
a crawl of https://flugelanime.com which crashed on https://flugelanime.com/[Flugel]%20Fate%20Series/01%20[Flugel]%20Fate%20Zero/[Flugel]%20Fate%20Zero%20-%2003%20[BD%201080p%20HEVC%20Opus].mkv?preview
- at the moment, video is not loading at all, maybe this is a temporary issue, I don't get how it can have crashed the browser

I will probably test #709 only once released, unless you need help to test this before merge, pretty busy with other topics atm and testing a branch is not that straightforward on my end ^^ Thank you for these enhancements anyway

ikreymer commented 12 hours ago

Found a major issue, it appears there was a status code check and only 200 responses were being streamed, but all the videos are 206, and that was excluded from streaming 🤦 . This likely resulted in the browser crash since it tried to load the whole thing into memory 🤦 . Will be in the next fix!

webrecorder / browsertrix-crawler

Browser disconnected (crashed?) #706