spring-projects / spring-graphql

Spring Integration for GraphQL
https://spring.io/projects/spring-graphql
Apache License 2.0
1.52k stars 297 forks source link

Allow to distinguish between dataloader fetched fields in datafetcher observations and offer dataloader instrumentation #1034

Open mmuth opened 1 month ago

mmuth commented 1 month ago

Feature Request Hi there - I am currently migrating from graphql-kickstart to spring-graphql. The new observations in the GraphQlObservationInstrumentation are really great!

However I suffer from a huge amount of "noise spans" when using dataloaders, as the traces contain a Span for every resolved Child node (with an graphql.field.path in an array-like iteration /myfield/edges[x]). That means for a GQL request with a 5000 page size, 5000 spans are added even though only one single database request happens in the end and this is the interesting part to investigate on the resulting trace.

I'd propose that we can somehow recognize the "dataloaded fields" (e.g by attaching a graphql.datafetcher.completeablefuture=true KevValue as an extra highCardinalityKeyValue for each observation; or something similar), then one would be able to filter them later. To make this complete, an Observation for all called dataloaders would be very helpful.

bclozel commented 3 weeks ago

Thanks for reaching out. I had a look at our instrumentation and wrote a test case for this. We can consider several changes here and some of them are quite involved, so I have scheduled this for our next feature release.

As far as I can see, it is completely transparent to a DataFetcher caller whether it is using a DataLoader internally or not. Also, there are various strategies that can trigger the DataLoader dispatch - by default this is called at each depth level of the query.

We can extend our SelfDescribingDataFetcher to expose whether this is a batching operation. I believe we could automatically expose this for @BatchMapping handlers, but of course not for custom data fetcher registrations. With that extra information, we could then make it available to our DataFetcherObservationContext and turn this information into a low cardinality key-value for the data fetcher observation. We would need to think about a proper name, but graphql.datafetcher.batching=true comes to mind.

We can also consider instrumenting DataLoader operations, but this is more challenging. Instrumenting DataLoader as they are being registered in the DataLoaderRegistry is complex, because we would need to involve the Observation API there and get the actual ObservationRegsitry instance at that point. Right now, instrumentation is neatly separated in its GraphQlObservationInstrumentation class.

We could, when the request execution starts, get the DataLoaderRegistry and for each registered data loader:

  1. get the DataLoader from the registry
  2. unregister it
  3. wrap it for instrumentation purposes
  4. re-register it under the original key

We would then need to create a new context/convention/key-values set for this new graphql.dataloader observation. This would certainly make the observability better but we should also consider instrumentation overhead.

rstoyanchev commented 1 week ago

For the route with SelfDescribingDataFetcher and @BatchMapping methods, we can also support cases where DataLoader is injected into a controller method. That would cover all of @Controller usage, which would be a good place to be.