grafana / pyroscope

Continuous Profiling Platform. Debug performance issues down to a single line of code
https://grafana.com/oss/pyroscope/
GNU Affero General Public License v3.0
9.98k stars 598 forks source link

Service analytics #3256

Open kolesnikovae opened 5 months ago

kolesnikovae commented 5 months ago

Currently, we expect users to know what they are looking for and provide quite limited abilities for exploration. Pyroscope should provide users with insight into their data: it should be possible to identify "interesting" services or service instances and code hotspots at a glance, without querying anything specific.

Global View

At the highest level, we should present an overview of the user environment (the piece we're aware of): global statistics on the services sending data to Pyroscope.

We should identify a subset of "interesting" (or important) services that the user may want to take a closer look into. It does not mean we should only collect statistics for those services exclusively, but we should inform the user about them first; like suggestions.

This ought to be user-specific, although we could probably address this by utilizing multi-tenancy; the target audience is software engineers. Some of the criteria (in no particular order):

Service catalog (registry)

List of all the services, including details such as:

A user should be able to get answers to questions like:

These are frequently asked questions, and we can greatly help users if we provide them with all the necessary information.

Dimension statistics

We should provide a detailed breakdown of dimensions (labels) for each service. For each of the service labels, we identify top-K values and for each of the selected label values, we collect statistics such as: share of samples, ingestion traffic, and data on disk. Label query matches could be tracked as well.

Later on, this information could also help us to add new features like:

See: https://github.com/grafana/pyroscope/issues/2648, https://github.com/grafana/pyroscope/issues/3037, https://github.com/grafana/pyroscope/issues/3226

Code hot spots

We should provide a list of functions (or call sites) that the user might be interested in. In the simplest form, this is a top-K list of functions (both flat and cumulative). More sophisticated analysis is possible, because the statistics are supposed to be collected service-wide in the background.

Recent activity / query history

This is somewhat unrelated, but we also may want to keep track of the user actions globally (kind of audit log):

This part should be handled on the client side (in the Grafana app plugin).

Service relationships

It might be handy to indicate relations between services:

This part could be handled on the client side (in the Grafana app plugin).