Closed mkurapov closed 2 months ago
When running with telemetry and viewing the local dashboard after firing of some GQL requests, I see metrics like http_client_duration_milliseconds_bucket
. We could use something like this to make histograms or get performance for different percentiles.
However, I don't see this for the graphql resolvers, but I'm not sure why. That's what I'm currently investigating. I tried changing the config for the gql instrumentation but that didnt change anything. I expect the instrumentation should produce similar metrics, in which case we dont need to add explicit histogram metrics using our telemetry service.
OK, metric generation from spans working here: https://github.com/interledger/rafiki/tree/bc/2802/investigate-span-metrics-generator
We can use the traces_spanmetrics_latency_bucket
metric and span_label
(mutation CreateQuote
for example) to visualize the resolver durations as histograms.
Just to follow up on my previous comment, it seems like for whatever reason the HTTPInstrumentation
just collects these metrics by default whereas graphql does not.
(Blocked by #2808)
Context
We need to determine how long each of our GraphQL APIs take so we can get an estimate on how performant each of the operations are, and get an relative idea of how long a payment might take.
There is a few options of doing this, either manually instrumenting histograms per resolver, or since adding traces in #2808, we can potentially automatically generate metrics from Tempo as well via https://grafana.com/docs/tempo/latest/metrics-generator/
The metrics generation from Tempo should include generating histograms.
This ticket will be looking into seeing how we can do that/what will be easier.