opentracing / opentracing-java

OpenTracing API for Java. 🛑 This library is DEPRECATED! https://github.com/opentracing/specification/issues/163
http://opentracing.io
Apache License 2.0
1.68k stars 344 forks source link

ThreadLocalScopManager and parallelStream #353

Open williamokano opened 5 years ago

williamokano commented 5 years ago

Hey guys, I was just testing some ways to implement tracing using a POC application I created.

One of the services (product service) is called by the order service, but the products services don't expose an endpoint that return all the products at once, so I have to call it several times.

I know it's not good, not the best approach, but it's just a simulation that can occur in the real world.

The API is not well formed, so the items is represented by a Map<String, Long>, which is the item UUID and the item quantity.

My code run something like this (Kotlin):

val items = order.items.keys.parallelStream()
  .map(catalogAPI::getProduct)
  .collect(Collectors.toList())

My problem: as the parallelStream runs on another Thread(Pool), and the default implementation of the ScopeManager is ThreadLocalScopeManager, this doesn't work, as it loses the reference to the original span.

Question is, why it's not implemented using InheritableThreadLocal?

Second question is: there's some easy approach to solve this problem?

Thanks in advance and sorry about the bad english.

yurishkuro commented 5 years ago

You can use custom thread pool (https://www.baeldung.com/java-8-parallel-streams-custom-threadpool) that will transfer context to worker threads.

williamokano commented 5 years ago

Hi, I agree that I can use a custom thread pool, but the code can become much more verbose and relying on that can be error-prone, since one can simply forget to change the thread pool.

I created a "new" ScopeManager and Scope and it seems it solved my problem, although since I don't know the whole architecture of the opentracing I don't know to what kind of errors it could lead. With the following, I can use the parallelStream() without requiring to manually change the thread pool every time. Tried fixing the problem using InheritableThreadLocal.

yurishkuro commented 5 years ago

I don't see how inheritable TL would help when using a thread pool, since the worker threads creation is not correlated with the caller thread.

I agree custom executor leads to uglier code, but that's a defect in ForkJoinPool design, since it does not allow reconfiguring its default "common" pool. To my knowledge, overriding it via byte code manipulation is the only way.

williamokano commented 5 years ago

It actually didn't work.

@yurishkuro well, on my testing it worked just fine, at the least on the tests I did, which of course may have a lot of errors.

Do you have any examples of doing it through byte code manipulation? Not gonna lie that I was reading about ByteBuddy to create an agent and do some "magic stuff", but running the jar without the javaagent com lead to bizarre errors, so I stepped back a little bit.

yurishkuro commented 5 years ago

I haven't seen an example anywhere, opened a ticket https://github.com/opentracing-contrib/java-specialagent/issues/102

williamokano commented 5 years ago

Nice, thanks. Gonna take a look at this specialagent, I didn't know about him. Thanks a lot.

@ edit: about the InheritableThreadLocal, I couldn't force a way to break my tests.

Not gonna lie, the first scenarios I was creating the executorService inside the controller method, so the context was kept, but now I wrote a scenario where the executor service is created by spring inside a configuration, so it shouldn't have any correlation with the thread create by tomcat to handle the request, and yet, I can keep the context with InheritableThreadLocal (which I can't with ThreadLocal).

@yurishkuro if you could check my sample, here's the link: https://github.com/williamokano/spring-distributed-tracing-demo

As you can see, I defined the ExecutorService, with a new ThreadPool, as a bean, so it has nothing to do with the Thread that is handling the request (should be completely uncorrelated as you said).

To test I used just a POC (not with opentracing actually).

I set a header on an InheritableThreadLocal<Context> inside a servletFilter so I can get it back on the controller.

When I print the value on the ExecutorService (created on the application startup), if I'm using InheritableThreadLocal it prints the right header, so it kept tracking somehow. If I use a regular ThreadLocal it doesn't work.

Also, using my implementation of ScopeManager using InheritableThreadLocal, although the parallelStream is running on the ForkJoinPool.commonPool(), it does work in fact.

I'm really confused right now. Perhaps it could be some bizarre OpenJDK implementation?

@ edit: you're right, executor service under stress don't work as I told above.

williamokano commented 5 years ago

Update: you're correct @yurishkuro , without stress to the VM the behavior indeed work, but when under load it behaves as it should be, it loses the reference to the point that the behavior is non-deterministic, leading to undesired/undefined behaviors.