tailcallhq / tailcall

High Performance GraphQL Runtime
https://tailcall.run
Apache License 2.0
1.28k stars 253 forks source link

OpenTelemetry integration #1240

Closed meskill closed 5 months ago

meskill commented 8 months ago

Description

Provide integration with opentelemetry for taicall with support for different exporters and configurations depending on users needs.

User perspective

If user is not interested in opentelemetry the tailcall should work as before and no additional actions for user should be done.

If user wants to enable opentelemetry output from tailcall they can use new directive on schema @opentelemetry that specifies settings where to export data and in which format.

Example of config:

schema
  @server(port: 8000, graphiql: true, hostname: "0.0.0.0")
  @upstream(baseURL: "http://jsonplaceholder.typicode.com", httpCache: true)
  @opentelemetry(
    export: {
      otlp: {
        url: "https://api.honeycomb.io:443"
        # gather api key from https://ui.honeycomb.io and set it as env when running tailcall
        headers: [{key: "x-honeycomb-team", value: "{{env.HONEYCOMB_API_KEY}}"}]
      }
    }
  ) {
  query: Query
}

In that case opentelemetry data from taillcall will be exported to the provided service and the responsibility to aggregate and process that data is on that external service

Development perspective

Opentelemetry provides various Rust crates that implements different aspects of integration into the app.

Core

Core should be able to generate any opentelemetry data when needed in simple way preferably without any feature flags inside the code.

For tracing and logs we can use tracing crate instead of log. Benefits of it is that tracing manages traces and logs already, have built-in methods to create different wrappers and the data from it could be exported as opentelemetry data with tracing-opentelemetry crate.

For metrics we can't use tracing and have to use opentelemetry crates functionality explicitly. It should use available functionality to send data from opentelemetry core that is not tied to specific exporters

CLI/Native app

The specific environment should define exporters based on the passed configuration. This is done mostly by specific crates for opentelemetry.

The first implementation should start with a couple of available integration and should be easily extensible by additional options in the future.

WASM

Performance

Initial integration with 2 spans and 1 metric doesn't show significant changes in performance.

But using async-graphql::extensions::OpenTelemetry reduces overall RPS for benchmark by 30%, but it outputs a lot of spans with most of them are basically no-op function for fields with no resolvers. That's probably could be stripped in some way or ignored.

Testing

meskill commented 7 months ago

Enabling async_graphql::extensions::OpenTelemetry generates a lot of redundant spans for every field of the entity. E.g. for list of posts it looks like this:

image

There are 200 spans and that will only increase with higher count of data and fields requested.

The couple of this mentioned in the async-graphql's repo: https://github.com/async-graphql/async-graphql/issues/1395#issuecomment-1761180698 https://github.com/async-graphql/examples/pull/59#issuecomment-1287791874 so probably other extension should be used or be written manually

github-actions[bot] commented 5 months ago

Action required: Issue inactive for 30 days. Status update or closure in 7 days.

github-actions[bot] commented 5 months ago

Issue closed after 7 days of inactivity.