webrecorder / browsertrix

Browsertrix is the hosted, high-fidelity, browser-based crawling service from Webrecorder designed to make web archiving easier and more accessible for all!
https://webrecorder.net/browsertrix
GNU Affero General Public License v3.0
201 stars 35 forks source link

Metrics / Tracing: What are they good for? #1032

Open Chickensoupwithrice opened 1 year ago

Chickensoupwithrice commented 1 year ago

One thing I'd like to work on for Browsertrix is some sort of metrics implementation. There have been past attempts at getting metrics working (specifically k8s metrics server for cpu / mem utilization) but since nobody is using them it's fairly useless. This issue is to catalogue use cases for metrics on Browsertrix.

Thoughts:

it's an interesting challenge because Browsertrix is made up of quite a few smaller components (crawler, replay webpage, front / back end) and it isn't clear which components would benefit from having tracing or metrics implemented for them, and what visibility they would grant to browsertrix.

Shrinks99 commented 1 year ago

As a designer, UI interaction metrics are really helpful for gauging which parts of your app users interact with the most and (at a more detailed level) which actions are accomplished in the same order often. Would be super jazzed to see the first part and it would help me out!

Shrinks99 commented 1 year ago

RE: Crawling metrics, something that we discussed a while ago that is worth noting here is possible integration of Wappalyzer as a method of diagnosing if crawling / replay issues are related to certain web technologies.