Add semantic convention for describing graphql spans

svrnm commented 3 years ago

What are you trying to achieve?

GraphQL is an open-source data query and manipulation language for APIs, and a runtime for fulfilling queries with existing data. [...] GraphQL servers are available for multiple languages, including Haskell, JavaScript, Perl, Python, Ruby, Java, C++, C#, Scala, Go, Rust, Elixir, Erlang, PHP, R, D and Clojure. (Source: Wikipedia)

There are already graphql instrumentation libraries for Node.JS and Ruby (https://opentelemetry.io/registry/?s=graphql) and as quoted above there could be many more. GraphQL has a specification that could be used to define span attributes. For example the Node.JS instrumentation defines the following:

export enum SpanAttributes {
  COMPONENT = 'graphql',
  SOURCE = 'graphql.source',
  FIELD_NAME = 'graphql.field.name',
  FIELD_PATH = 'graphql.field.path',
  FIELD_TYPE = 'graphql.field.type',
  OPERATION = 'graphql.operation.name',
  VARIABLES = 'graphql.variables.',
  ERROR_VALIDATION_NAME = 'graphql.validation.error',
}

The most important ones would be the operation name (query, mutatation, ...) and if available the query name.

ericmustin commented 2 years ago

Was there any reason this stalled? The stronger we can make the semantic conventions around instrumentation attributes, the better, especially for these details which are already implemented across multiple languages?

spencerwilson commented 2 years ago

Just want to point to these two issues folks raised in JS that involve GraphQL semconv:

I don’t see any acute reason this should be blocked, so I assume it’s stalled simply for lack of someone making a PR.

jord1e commented 2 years ago

I can look into contributing a PR this weekend, some prior art to be looked at is the Apollo schema reporting Protobuf file (https://github.com/apollographql/apollo-server/blob/e468367d52e11f3127597e4fe920eb8294538289/packages/apollo-reporting-protobuf/src/reports.proto)

And of course the standing opentelemetry-js(-contrib) and opentelemetry-ruby conventions

Some input from the GraphQL Working Group may also be useful (do companies use Apollo Query Ids, should they be added as a Span attribute?, should we add the query name as an attribute?)

spencerwilson commented 2 years ago

Something I'm unsure whether should be in scope of GraphQL semantic conventions, but suspect should be: For a GraphQL server, there are many distinct procedures that might deserve getting their own span, depending on one's telemetry requirements. Some of them include:

parse the given string as an ExecutableDocument
run validation on the parsed document
execute an operation in the document (a document may contain multiple OperationDefinitions, but during document execution at most one of the operations defined within is executed)
- [ExecuteField](https://spec.graphql.org/October2021/#ExecuteField()): Encompasses getting the complete return value for a field and any selection set on it
- ResolveFieldValue: When the function associated with a field is run.
- CompleteValue: Either lightweight "finishing" logic (if the current field's type is of kind Scalar or Enum, or a List or Non-Null wrapping thereof), or one or more field executions otherwise (resolving the selection set on the field)

I've not reviewed all existing GraphQL OTel instrumentations to see which of these procedures get their own span currently. The only one I've reviewed is the JavaScript instrumentation (repo), in which the following procedures get spans (src):

parse and parseSchema: These spans are produced by wrapping the parse function exported from graphql-js
validate and validateSchema: Produced by wrapping the validate function exported from graphql-js
execute: Produced by wrapping graphql-js's execute function. As of graphql-js@16.2.0 this function is only to be used for operations of type query and mutation, but eventually it will be the supported way of executing subscription, too (src).
- Corresponds to [ExecuteRequest](https://spec.graphql.org/October2021/#ExecuteRequest()) in the spec, but notably due to the aforementioned implementation detail, the GraphQL instrumentation currently only creates execute spans for queries and mutations, not for subscriptions.
resolve: Produced by mutating the relevant schema object, wrapping any application-provided resolver functions. Note: spans are not created when graphql-js's trivial resolver—property access on the parent value—is used.
- Corresponds to [ResolveFieldValue](https://spec.graphql.org/October2021/#ResolveFieldValue())

My motivation to go into this detail here is that as a user of the graphql-js auto instrumentation, I've at times found myself wishing that I had spans for ExecuteField, i.e., something representing ResolveFieldValue (individual resolvers running) + CompleteValue (encompassing all recursive resolution). (note: The viability of instrumenting some of these procedures might differ depending on the language and how instrumentations are implemented in each language. E.g., I've considered how the JS instrumentation might be changed to achieve what I want, but if what I've considered works at all it might have some costs that outweigh its telemetry-adding benefit.)

The relevance to semconv, minimally, is: How should these different procedures be indicated on the Span data model? My intuition suggests they should be codified at least as some span attribute (if not also a component of span name), but I haven't read all the existing semconv to know if that's best—or if there's an obvious right answer that already exists, for that matter.

Also want to mention the following which might be helpful references, if we're ever seeking definitions for things, see:

a WIP Glossary in graphql/graphql-wg
the GraphQL spec (naturally). Some definitions will be indicated with Spec Markdown's definitions syntaxes, but it looks like that's a relatively new feature in Spec Markdown so there are surely plenty of definitions which aren't yet syntactically indicated as definitions.

tigrannajaryan commented 2 years ago

Done in https://github.com/open-telemetry/opentelemetry-specification/pull/2456

open-telemetry / opentelemetry-specification

Add semantic convention for describing graphql spans #1670