zio / zio-http

A next-generation Scala framework for building scalable, correct, and efficient HTTP clients and servers
https://zio.dev/zio-http
Apache License 2.0
786 stars 396 forks source link

Improve OpenTelemetry integration so traces include path #2835

Closed SimunKaracic closed 1 month ago

SimunKaracic commented 5 months ago

Is your feature request related to a problem? Please describe. When using the OTEL agent with ZIO-http, we get the benefit of existing Netty instrumentation. This means traces are started/stopped and propagated correctly, but there's a problem. The server trace names contain only the HTTP method (which is useless, we need the route in the trace name).

From what I can tell, we're hitting the same issue as described here https://github.com/open-telemetry/opentelemetry-java-instrumentation/issues/8613

We need to add instrumentation to the OTEL agent that provides the span name. This involves hooking into specific classes and methods in ZIO-http.

Describe the solution you'd like Someone from the core team decides if there's enough stability in the internal API to hook into it with the OTEL java agent.

And if there is, you can provide me the classes/methods where I can see the matched route and I'll try and add the required instrumentation to OTEL.

Describe alternatives you've considered Someone from the core team writes a plugin and I don't have to do anything? :D

jdegoes commented 4 months ago

/bounty $150

algora-pbc[bot] commented 4 months ago

💎 $150 bounty • ZIO

Steps to solve:

  1. Start working: Comment /attempt #2835 with your implementation plan
  2. Submit work: Create a pull request including /claim #2835 in the PR body to claim the bounty
  3. Receive payment: 100% of the bounty is received 2-5 days post-reward. Make sure you are eligible for payouts

Thank you for contributing to zio/zio-http!

Add a bountyShare on socials

Attempt Started (GMT+0) Solution
🔴 @kyri-petrou Jul 6, 2024, 11:45:30 PM WIP
SimunKaracic commented 3 months ago

Usually these types of contribution from bug bounties end up in the actual repo where the bug bounty is raised, yes? So someone implements it, and the core team takes over maintenance afterwards.

However, if this gets implemented as described above, the code will reside in the opentelemetry repo. Who then maintains it after the bounty is complete? Who will update the instrumentation if ZIO-http internals change?

And is someone willing to point me to a definitive place where I can call render on a RoutePattern that gets handled?

I've been trying to figure it out, and I can get printlns of rendered route patterns, but the code is so generic that I'm not sure where in the request handling lifecycle I'm hooking in.

kyri-petrou commented 3 months ago

@SimunKaracic if I'm understanding your usecase correctly, have you considered creating the spans in a middleware which opens/closes the span? It can be placed last so that it's the run as close to the netty/zio-http boundary

SimunKaracic commented 3 months ago

Sure, I can do that.

But this isn't about making a workaround for me, but about improving the out of the box experience for ZIO-http.

If we're gonna tell people to use the OTEL agent to get tracing in the ZIO ecosystem, we should provide good traces/metrics by default.

For all of their supported frameworks, you don't have to do a thing to get well named traces with useful attributes. You add the agent and you're done. Check out what traces it provides when you run an example spring/armeria/quarkus app.

Kamon is the same, if it supports your framework of choice, you get an amazing experience with no fiddling on your side.

I hope that makes sense.

987Nabil commented 3 months ago

@SimunKaracic Am I right, that the instrumentation would use reflection to get the needed information? I mean they would need to extract infos that can't be reached by user code, right? If that is true and it is possible to instead expose open telemetrie via some kind of lib or http endpoint or whatever, I think I'd prefer that. What I mean is, adding tracing information in a generic format in zio-http, that can be accessed and exported by an open telemetry or instana or whatever module that is hosted in zio-http.

SimunKaracic commented 3 months ago

Not reflection, but a javaagent, bringing significantly less of a runtime perf cost. And yes, they extract info not reachable by user code!

But none of the code for this is in zio-http, nothing is exposed by zio-http, you tell the OTEL agent where to hook in.

Here'a a simplified explanation of what happens:

If you look at their codebase, you can see which libraries are instrumented, and which classes/methods they hook into.

I hope this clarifies the approach chosen by the OTEL folks a bit. If not, ask away, and I'll do my best to answer.

SimunKaracic commented 3 months ago

So this is why I'm asking for confirmation that some internal zio-http code, where the OTEL would hook into, won't change. The javaagent recognizes where to add that code based on class name and the method being called

987Nabil commented 3 months ago

@SimunKaracic thanks for the explanation. I am sure instana is doing more or less the same. Even though this is how they do it, I wonder if this works well with the zio architecture. Also, after some research, I think we could hook zio-logging/metrics into open telemetry.

In general, I don't see us giving any guarantees, that internal APIs will stay the same. If they hook into our classes and it works, great. But if not... 🤷‍♂️

I am much more in favor of having built in metrics, spans, logs and whatever. @jdegoes wdyt?

kyri-petrou commented 3 months ago

I think a potential middle-ground would be to have a configurable property somewhere like defaultMiddleware or similar. Then the javaagent can use that to configure the app with a middleware that does all the OTEL stuff.

It's similar to how Kamon instruments the ZIO runtime, via the Runtime.defaultSupervisor FiberRef

kyri-petrou commented 3 months ago

@987Nabil I agree and think it would be great for zio-http to support these things out-of-the-box, but from what I can tell that's a separate piece of work. I think the functionality that @SimunKaracic is asking for here goes a bit beyond OTEL and metrics. I think it's a bit more of a generic "I want to be able to extract info from requests and run some logic without users changing anything in their code".

I have an idea that I think it can work pretty well, I'll give it a go and see what we think

/attempt #2835

Algora profile Completed bounties Tech Active attempts Options
@kyri-petrou 18 ZIO bounties
Scala, Python,
Shell & more
﹟2679
Cancel attempt
kyri-petrou commented 3 months ago

@SimunKaracic what do you think of the linked PR? Would that FiberRef be a good place for OTEL to hook to? Note that it would need to create a middleware to extract any info required from the Request and then close the span / context when intercepting the response. Take a look at the tests that I added for a middleware that can do that

SimunKaracic commented 3 months ago

It's similar to how Kamon instruments the ZIO runtime, via the Runtime.defaultSupervisor FiberRef

That's exactly what OTEL is doing

I think the functionality that @SimunKaracic is asking for here goes a bit beyond OTEL and metrics. I think it's a bit more of a generic "I want to be able to extract info from requests and run some logic without users changing anything in their code".

I think there's a misunderstanding here. I don't want anything of the sort.

If you add it to your ZIO-http app, the OTEL agent already starts/closes the spans correctly, because Netty is instrumented.

It also provides the following out of the box for every incoming request span:

  • http.client_ip
  • http.method
  • http.request_content_length
  • http.response_content_length
  • http.scheme
  • http.status_code
  • http.target
  • net.host.name
  • net.protocol.name
  • net.protocol.version
  • net.sock.host.addr
  • net.sock.host.port
  • net.sock.peer.addr
  • net.sock.peer.port
  • net.transport
  • thread.id
  • thread.name
  • user_agent.original

Only difference in Spring support and zio-http support in OTEL is the span name. If you get a GET request to route /user/1, the span name will be:

  • in Spring: GET /user/{id}
  • in ZIO-http: GET

That's it, the only thing I want. You add the OTEL agent, and the span reflects the matched routePattern, in addition to all the other goodies that OTEL provides. Everything else is there already. It starts/stops spans correctly, it creates child spans for Elasticsearch, Redis, Postgres, S3, etc...

I'm not sure what to think about the linked MR, I'm not a ZIO-http contributor, but if accomplishes this, I'm fine with it.

I'd then request that how to actually get what I described above gets documented, so users of ZIO-http know how to get useful trace names when using the OTEL agent.

SimunKaracic commented 3 months ago

I am much more in favor of having built in metrics, spans, logs and whatever.

If you provide built in spans/metrics in ZIO-http, that means your users have to do one of two things:

  • manually add Spans for any other client calls they need, and decide what metrics to attach to them
  • somehow integrate ZIO-http metrics/tracing with OTEL/Kamon java agent to get metrics/traces for everything else in their app that ZIO-http doesn't provide (S3, all SQL databases, HTTP clients, Cassandra, DynamoDB, etc...)

I understand the reasoning behind providing built-in metrics/tracing, but then you potentially miss out on the huge amount of work already done in OTEL to support the wider Java ecosystem.

Either way, I have explained my use case above, and regardless of how we get there, I'll be happy to get nice span names out of ZIO-http.

kyri-petrou commented 3 months ago

@SimunKaracic OK I think I misunderstood what you were asking for. I thought what you needed was something that zio-http would provide that could be instrumented by OTEL. Now I realise that the issue is basically that zio-http does something and Netty isn't being instrumented properly.

I'm really not sure what's not being instrumented correctly. I had a look at the OTEL instrumentation for Netty and couldn't find out where exactly they're extracting the path from the request to create the span. Do you by any chance know where that might be happening in the netty OTEL instrumentation?

EDIT: OK reading through the issue description again and the comments I think I'm more confused than ever. I think what the PR was doing was something along the right path, but not exactly what you need. It was basically allowing you to instrument the defaultMiddleware value and use a custom middleware to extract the matched route. A very similar middleware already exists (see here). However, that would still require zio-http to be instrumented by the otel library.

The part that really confuses me is, are you asking for a solution that requires no changes to any other repos/libraries, or are you asking for us to provide a better way that otel can instrument zio-http to extract the matched route? The comments seem to be conflicting on what the final solution looks like

SimunKaracic commented 3 months ago

My original idea was to provide a separate instrumentation module in the OTEL library, specifically for ZIO-http. No changes in ZIO-http.

All that module would have to do, is hook in at the specific place where an incoming request is matched against a user defined route. At the point, it would set the current span name to RoutePattern.render.The span has already been created by the netty instrumentation, and is only missing a useful span name. From the above issue in the OTEL repo: As a low level framework netty doesn't have a concept of a "route"

But we're in the design phase, and you folks know ZIO-http better than me, so I'd say You make the call on how you want this to end up looking.

are you asking for a solution that requires no changes to any other repos/libraries

I am asking for a solution that requires no extra setup from the users of ZIO-http. I would like the same experience for tracing that Spring/Quarkus/Play/Vertex users get from OTEL. Add the javaagent and you're done. No middleware, no config, nothing.

I'd say this needs @jdegoes or someone else to weigh in, to decide how they want the experience of using ZIO-http to look.

987Nabil commented 2 months ago

@SimunKaracic I would like both. So oob otel with no setup. And zio http having good metrics that ppl can utilize if they want to. For this issue, I guess you are right it should be fixed in otel java. Where to hook in is another question. Try to make a suggestion and I can maybe help to evaluate if it fits

algora-pbc[bot] commented 2 months ago

@kyri-petrou: Reminder that in 7 days the bounty will become up for grabs, so please submit a pull request before then 🙏

algora-pbc[bot] commented 2 months ago

The bounty is up for grabs! Everyone is welcome to /attempt #2835 🙌

987Nabil commented 1 month ago

Closing since I guess we clarified, that this can't be solved in this repo.