webrecorder / browsertrix

Browsertrix is the hosted, high-fidelity, browser-based crawling service from Webrecorder designed to make web archiving easier and more accessible for all!
https://browsertrix.com
GNU Affero General Public License v3.0
164 stars 30 forks source link

[Bug]: Screencast connection not closed after navigating away from `Watch Crawl` page #1367

Open vnznznz opened 10 months ago

vnznznz commented 10 months ago

Browsertrix Cloud Version

v1.7.1

What did you expect to happen? What happened instead?

The watch crawl feature uses a lot of bandwidth for me, about 30-40Mbit/s. When I navigate away from the crawl page I see the traffic subside for a bit and then restart. In the network tab of the developer console I can also see the initial websocket connection to the /watch endpoint being closed only for a new connection to appear a bit later.

image

Step-by-step reproduction instructions

  1. Open iftop or similar tool to watch network traffic
  2. Start long-running crawl
  3. Go to "Watch Crawl" Page
  4. See high bandwidth usage in iftop
  5. Navigate to other browsertrix-cloud page
  6. See bandwidth usage in iftop pick up after a while
  7. Close browsertrix-cloud tab
  8. See reduced bandwidth usage in iftop

Additional details

browsertrix-cloud: v1.7.1 browsertrix-crawler: v0.12.1

It's deployed on a Hetzner Ubuntu VM with k3s.

ikreymer commented 10 months ago

Took a quick look, it seems they websockets should be getting closed in the disconnectedCallback but maybe isn't getting called for some reason?