open-telemetry / opentelemetry-js-contrib

OpenTelemetry instrumentation for JavaScript modules
https://opentelemetry.io
Apache License 2.0
693 stars 508 forks source link

`@opentelemetry/instrumentation-graphql` increases event loop latency by ~10x #1739

Open HunterLarco opened 1 year ago

HunterLarco commented 1 year ago

What version of OpenTelemetry are you using?

1.17.0

What version of Node are you using?

17.9.1

What did you do?

Installed @opentelemetry/instrumentation-graphql with default options and depth: 0.

What did you expect to see?

Minimal impact to latency.

What did you see instead?

While debugging a graphql query using ApolloServer which returns 340KB of data (11,494 keys in the response JSON) I noticed that our business logic was executing just under 50ms while the request was taking ~1.2s. Using the tracing data collected by @opentelemetry/instrumentation-graphql I was able to observe that all of our resolvers took ~86ms CPU time but 1.1s wall time. I suspected something was wrong with resolver scheduling so ran clinic doctor -- node server.ts to gain insight and noticed during the span of the query, the event loop experienced ~250ms of latency. Removing @opentelemetry/instrumentation-graphql decreased this down to ~20ms of latency and reduced the overall request timing to 187ms.

TL;DR even with depth: 0, for large payloads @opentelemetry/instrumentation-graphql increased event loop latency by an order of magnitude.

Replication details

While replication of our exact problem would require our server code, a minimal example of problematic and large GraphQL payloads can be found here to test with: https://codesandbox.io/s/apollo-server-performance-issue-i7fk7

pichlermarc commented 1 year ago

This is possibly related to https://github.com/open-telemetry/opentelemetry-js-contrib/issues/1686. Apollo sets every resolver to a custom one, even if it's trivial, which causes extremely verbose traces.

sdacunha commented 5 months ago

I was facing the same issue when using otel via Sentry v8

If it helps, I currently disable the graphql integration (since I use Apollo Server) and setup a plugin and instrument manually via an Apollo plugin. It's possible to detect the trivial resolver case using this: https://community.apollographql.com/t/is-there-a-way-to-know-if-a-graphql-field-will-be-resolved-using-the-default-resolver/2573 and manually instrument

It seems like this may be one of the only feasible ways for Apollo users to instrument with reasonable overhead, and it should probably be kept separate from the graphql integration here