Browsertrix is the hosted, high-fidelity, browser-based crawling service from Webrecorder designed to make web archiving easier and more accessible for all!
What did you expect to happen? What happened instead?
The watch crawl feature uses a lot of bandwidth for me, about 30-40Mbit/s. When I navigate away from the crawl page I see the traffic subside for a bit and then restart. In the network tab of the developer console I can also see the initial websocket connection to the /watch endpoint being closed only for a new connection to appear a bit later.
Step-by-step reproduction instructions
Open iftop or similar tool to watch network traffic
Start long-running crawl
Go to "Watch Crawl" Page
See high bandwidth usage in iftop
Navigate to other browsertrix-cloud page
See bandwidth usage in iftop pick up after a while
Browsertrix Cloud Version
v1.7.1
What did you expect to happen? What happened instead?
The watch crawl feature uses a lot of bandwidth for me, about 30-40Mbit/s. When I navigate away from the crawl page I see the traffic subside for a bit and then restart. In the network tab of the developer console I can also see the initial websocket connection to the
/watch
endpoint being closed only for a new connection to appear a bit later.Step-by-step reproduction instructions
Additional details
browsertrix-cloud: v1.7.1 browsertrix-crawler: v0.12.1
It's deployed on a Hetzner Ubuntu VM with k3s.