sourcegraph / sourcegraph-public-snapshot

Code AI platform with Code Search & Cody
https://sourcegraph.com
Other
10.11k stars 1.27k forks source link

Real User (Performance) Monitoring for hover tooltips #1423

Open tsenart opened 5 years ago

tsenart commented 5 years ago

Background

Real User (Performance) Monitoring telemetry extends our ability to understand the user experience from the user perspective by capturing browser performance metrics such as DNS latency, Network transfer latency, DOM rendering, CSS repaints, Garbage Collection pauses, etc. The metrics available to us on the server side are not enough.

We already have Sentry to capture errors that happen in the browser extensions. We should look into integrating a RUM service to help us establish a real user performance baseline.

Once we have visibility into this baseline and the outliers, it'd be beneficial to come up with an internal SLO (Service Level Objective) for critical user operations. (e.g. hover load time 99p < 1s)

Slow performance, specially in the first interactions, is a big deterrent for new users to adopt Sourcegraph. Having an explicit SLO to measure and track is meant to manage that user satisfaction in regards to performance.

tsenart commented 5 years ago

/cc @felixfbecker @ijsnow @nicksnyder @keegancsmith

tsenart commented 5 years ago

This looks interesting, on the self-hosted route: https://github.com/peardeck/prometheus-user-metrics

tsenart commented 5 years ago

And on the managed options: https://raygun.com/platform/real-user-monitoring

nicksnyder commented 5 years ago

If we were to do this, I think focusing on one or two metrics would be the best way to start (e.g. hover tooltip latency, search latency).

One challenge is that our primary focus is performance at our customers, and we generally can't automatically report data back. It would be ideal if we could capture this data in a general way and allow sites to configure where to send it (or build it into the product).

Collecting this data on sourcegraph.com is useful too since performance there impacts how non-customers perceive Sourcegraph, but it does have unique performance characteristics that don't necessarily translate to enterprise (e.g. search index is disabled).

tsenart commented 5 years ago

One challenge is that our primary focus is performance at our customers, and we generally can't automatically report data back. It would be ideal if we could capture this data in a general way and allow sites to configure where to send it (or build it into the product).

If this performance data is sent from the browser extension, wouldn't it apply to both sourcegraph.com and private installations?

nicksnyder commented 5 years ago

I was assuming that we were talking about our web app, but yeah, we could theoretically track hover tooltip times from the browser extension (ideally bucketed by language) across public and private code. @dadlerj would you see any problems with this type of data collection?

dadlerj commented 5 years ago

We explicitly never track any user activity data from the browser extension, and we make bug reporting (Sentry) opt-in:

image

We would need to use the same user flow for performance tracking (adding another checkbox, or maybe just making that one more general). Even something as small as referrer URLs leaking (which typically include repo names, filenames, etc) would not be okay.

tsenart commented 5 years ago

Even something as small as referrer URLs leaking (which typically include repo names, filenames, etc) would not be okay.

If we guarantee none of that information is leaking by design, could we have this be opt-out?

dadlerj commented 5 years ago

Then I'd personally be fine with it! It'd be a product/eng team question at that point @sqs @ijsnow

ijsnow commented 5 years ago

I like the idea!

How will we differentiate metrics derived from the page the browser extension is running in and the browser extension itself?

It seems to me, in the world of extensions, a lot of the time that will be taken into account will be when extension code is executing (3rd party) rather than our own browser extension/extension host code. Have you considered that? Should we come up with an extension that implements all features in the extension API and run benchmarks against that instead? I'm concerned that the information we get from this won't be all that useful as it will mostly be coming from extensions.

nicksnyder commented 5 years ago

You are right, most data in the hover tooltips come from extensions.

A given hover request might get data from multiple extensions, and it would be great to track hover tooltip load time per extension.

Mechanically, do we wait for all providers to return before showing the hover tooltip, or do we re-render as results are added?

ijsnow commented 5 years ago

I believe we re-render as more are added.

stale[bot] commented 5 years ago

Please post an update or close this with an explanation if it is no longer relevant. This will be closed automatically if there is no more activity.

tsenart commented 5 years ago

It's very much still relevant Mr. Stalebot.

Joelkw commented 3 years ago

@felixfbecker do you still feel a need for this? Haven't seen any explicit feedback from customers about speed for extension hovers in my time here so far so not sure if we made some infra improvements along the way since this was created two years ago.

felixfbecker commented 3 years ago

@sourcegraph/code-intel what do you think?

macraig commented 3 years ago

@Joelkw haven't seen that piece of feedback either so far. We can always reopen if necessary. I'll tag @efritz just in case he wants to add any historical context I might be missing.

efritz commented 3 years ago

We're not currently tracking any code intel latencies in telemetry, so I don't think there's any necessity for this on our side at the moment.

tsenart commented 3 years ago

Sorry, how exactly can we know what users are experiencing if we don't capture these end metrics? We can't rely on people reporting things. Many will just quit Sourcegraph in frustration and never say anything about it. I think we need this data, absolutely. And not necessarily in pings, but in our monitoring and tracing infrastructure.

Joelkw commented 3 years ago

@tsenart, I would definitely be curious to see the data if we started collecting it, but right now we have to keep this at a priority level of "unless we're getting active feedback it's bad or we have other reason to believe it's bad (usage dropoff, our own manual tests, etc), creating the monitoring to ensure it isn't silently bad is low priority relative to things we have active signal are valuable" at least for the web team. Not opposed to collecting this data in the future or otherwise prioritizing if that information changes.

efritz commented 3 years ago

In order to triage issues as belonging to the extension/extension host rather than the extension code itself (the classic battle of "this looks like a code intel problem" by virtue of codeintel extensions being enabled everywhere) if we were to have the latency of the entire interaction as well as the latency of the extension's requests/computation as a comparison?

unknwon commented 3 years ago

Just discovered this from Sentry, we may utilize this feature since we're already using Sentry (but needs to upgrade our plan 😞 ).

CleanShot 2021-01-18 at 19 45 59@2x

https://sentry.io/organizations/sourcegraph/performance/