open-telemetry / opentelemetry-java

OpenTelemetry Java SDK
https://opentelemetry.io
Apache License 2.0
2k stars 831 forks source link

ClassNotFoundException io.grpc.ManagedChannel using the OtlpGrpcMetricExporter in clojure. #4839

Open gabrielgiussi opened 2 years ago

gabrielgiussi commented 2 years ago

Describe the bug A ClassNotFoundException is thrown trying to build the OtlpGrpcMetricExporter in clojure.

#error{:cause "io.grpc.ManagedChannel",
       :via [{:type clojure.lang.Compiler$CompilerException,
              :message "Syntax error compiling . at (/poc/src/nu/stem/glue/metrics.clj:66:68).",
              :data #:clojure.error{:phase :compile-syntax-check,
                                    :line 66,
                                    :column 68,
                                    :source "/poc/src/nu/stem/glue/metrics.clj",
                                    :symbol .},
              :at [clojure.lang.Compiler analyzeSeq "Compiler.java" 7132]}
             {:type java.lang.NoClassDefFoundError,
              :message "io/grpc/ManagedChannel",
              :at [java.lang.Class getDeclaredMethods0 "Class.java" -2]}
             {:type java.lang.ClassNotFoundException,
              :message "io.grpc.ManagedChannel",
              :at [jdk.internal.loader.BuiltinClassLoader loadClass "BuiltinClassLoader.java" 581]}],
       :trace [[jdk.internal.loader.BuiltinClassLoader loadClass "BuiltinClassLoader.java" 581]
               [jdk.internal.loader.ClassLoaders$AppClassLoader loadClass "ClassLoaders.java" 178]
               [java.lang.ClassLoader loadClass "ClassLoader.java" 522]
               [java.lang.Class getDeclaredMethods0 "Class.java" -2]
               [java.lang.Class privateGetDeclaredMethods "Class.java" 3166]
               [java.lang.Class privateGetPublicMethods "Class.java" 3191]
               [java.lang.Class getMethods "Class.java" 1904]
               [clojure.lang.Reflector getMethods "Reflector.java" 498]
               [clojure.lang.Compiler$HostExpr$Parser parse "Compiler.java" 996]
               [clojure.lang.Compiler analyzeSeq "Compiler.java" 7124]
               [clojure.lang.Compiler analyze "Compiler.java" 6806]
               [clojure.lang.Compiler analyze "Compiler.java" 6762]
               [clojure.lang.Compiler$BodyExpr$Parser parse "Compiler.java" 6137]
               [clojure.lang.Compiler$FnMethod parse "Compiler.java" 5479]
               [clojure.lang.Compiler$FnExpr parse "Compiler.java" 4041]
               [clojure.lang.Compiler analyzeSeq "Compiler.java" 7122]
               [clojure.lang.Compiler analyze "Compiler.java" 6806]
               [clojure.lang.Compiler eval "Compiler.java" 7191]
               [clojure.lang.Compiler eval "Compiler.java" 7149]
               [clojure.core$eval invokeStatic "core.clj" 3215]
               [clojure.core$eval invoke "core.clj" 3211]

I don't have the grcp lib in the classpath but this is working in a java project. I saw this message in the changelog for the version 1.17.0

The OTLP gRPC exporters will now function without the grpc-java dependency, if okhttp is present on the classpath.

So I understand it is ok to not have the grpc dependency and I do have okhttp:4.10.0 in the classpath as a transitive dependency.

Steps to reproduce Loading a namespace into the repl with this snippet already triggers the exception

(.build (OtlpGrpcMetricExporter/builder))

What version and what artifacts are you using? Artifacts: opentelemetry-api, opentelemetry-sdk, opentelemetry-exporter-otlp Version: 1.19.0

:deps {io.opentelemetry/opentelemetry-api {:mvn/version "1.19.0"}
       io.opentelemetry/opentelemetry-sdk {:mvn/version "1.19.0"}
       io.opentelemetry/opentelemetry-exporter-otlp {:mvn/version "1.19.0"}}

Environment Compiler: AdoptOpenJDK 11.0.11

Additional context If I include the grpc-stub dependency then it is working fine but I don't know which version should I use or if this is the expected way to solve it.

jack-berg commented 2 years ago

We have some runtime logic that conditionally used io.grpc.ManagedChannel if the user calls OtlpGrpcMetricExporterBuilder#setChannel(ManagedChannel). In java this works fine without including grpc-stub on the classpath as demonstrated in the default test suite of :exporters:otlp:all. This behavior relies on pretty careful references to io.grpc.ManagedChannel such that the class is only loaded when the user references it. That's the theory anyway.

I was able to recreate this error locally. Groovy class loading appears to work differently - the groovy runtime is more aggressive about loading classes than java.

Open to ideas on how we can fix this. I can think of some ugly ideas (like providing your own empty implementation of io.grpc.ManagedChannel) but don't know groovy internals well and don't know the "right" way to fix this.

jkwatson commented 2 years ago

@jack-berg good to know that the groovy classloader breaks things...but this question is about clojure. Perhaps it also does some classloading shenanigans?

jack-berg commented 2 years ago

🤦 ah..

gabrielgiussi commented 2 years ago

Perhaps it also does some classloading shenanigans

Yes, when it parses the expression (.build (OtlpGrpcMetricExporter/builder)) it asks for the methods of the class to find if there is any called build with no arguments so it can call that method. This part of the stacktrace is doing that

[java.lang.Class getMethods "Class.java" 1904]
[clojure.lang.Reflector getMethods "Reflector.java" 498]
[clojure.lang.Compiler$HostExpr$Parser parse "Compiler.java" 979]

Once we call getMethods then it is already java trying to load the class io.grpc.ManagedChannel since one of the method setChannel receives that.

I see the method is deprecated, do you have plans to remove it entirely in the near future? I think we could do some tricks like including our own implementation of the class meanwhile but IMHO we shouldn't be doing any special class loading to avoid this error.

jack-berg commented 2 years ago

It's worth noting, we rely on the specifics of java class loading in several places in the autoconfiguration module as well, so this won't be the only place where closure is impacted.

I'm not sure we can do anything about this. Maybe if we knew more about closure's mechanics for loadings classes we could do something like change OtlpGrpcMetricExporter.builder(ManagedChannel) to accept an Object instead of ManagedChannel. But the internals will still eventually need to cast to ManagedChannel, which might still trigger closure to try to load the class. But even if that worked for closure, it would have to wait until a 2.x.x version since it's not an allowable change under our backwards compatibility guarantees.

Is there anything we could do from a classloader perspective to provide a fallback implementation of io.grpc.ManagedChannel is none is available?

gabrielgiussi commented 2 years ago

we could do something like change OtlpGrpcMetricExporter.builder(ManagedChannel) to accept an Object instead of ManagedChannel. But the internals will still eventually need to cast to ManagedChannel, which might still trigger closure to try to load the class

I think that will work, because I'm never explicitly calling the setChannel so clojure won't try to load the class, and correct me if I'm wrong but I understand that method is not called internally or indirectly, otherwise having that dependency in the classpath shouldn't be optional. In other words can we be sure that if I'm not calling the setChannel method explicitly then that method won't be called at all? If that's the case then making setChannel accept an Object instead of a ManagedChannel will prevent this exception.

jack-berg commented 2 years ago

I'm never explicitly calling the setChannel so clojure won't try to load the class

Is there a reference available that describes the mechanics around when / how clojure decides to load a class? We already don't call setChannel internally ever. The code path you're using only has unused references to ManagedChannel (i.e. imports in the class used in methods that are not called). Not sure that switching to an Object would solve the problem.

Even if it did work, the next sentence from the quote is important:

even if that worked for closure, it would have to wait until a 2.x.x version since it's not an allowable change under our backwards compatibility guarantees.

rafaeldff commented 2 years ago

I think Clojure is just calling .getMethods() on the class object for the OtlpGrpcMetricExporter class, and Java reflection is loading the classes for the types referenced in method signatures. If this is correct, any other tool that uses reflection to introspect this class would trigger the problem. If no member of the class references the lazy-loadeable class (which I think would be the goal of changing the setChannel signature to take an Object), it should be fine, even some of the implementation depends on that class. At least, reflection users wouldn't be affected, I think.

gabrielgiussi commented 2 years ago

@jack-berg

even if that worked for closure, it would have to wait until a 2.x.x version since it's not an allowable change under our backwards compatibility guarantees

In that case won't be better just get rid of the method since is marked as deprecated?

jack-berg commented 2 years ago

I think Clojure is just calling .getMethods() on the class object for the OtlpGrpcMetricExporter class, and Java reflection is loading the classes for the types referenced in method signatures. If this is correct, any other tool that uses reflection to introspect this class would trigger the problem.

If this is indeed the case, I don't think we can solve this issue with changes in opentelemetry-java. We rely on a combination of compileOnly dependencies and java's class loading semantics to drive behavior which is conditional on the classpath. Supporting reflective access opens the door to interacting with the library in all sorts of unpredictable ways that we intentionally don't expose in the public APIs.

In that case won't be better just get rid of the method since is marked as deprecated?

Well there's an outstanding issue requesting that we un-deprecate the setChannel method. If the arguments in that issue are resolved by the time a 2.x.x we could remove it.

gabrielgiussi commented 2 years ago

We rely on a combination of compileOnly dependencies and java's class loading semantics to drive behavior which is conditional on the classpath.

Since we are using opentelemetry-java in an experimental project we decided to add the grpc-api (*) dependency so the ManagedChannel class is in the classpath and the exception is not raised. But from this comment I'm not sure if doing so is doing more than just preventing the exception but also changing some behavior in a way we as users can't anticipate. In other words

  1. Is the otel library doing something different just because the ManagedChannel is in the classpath?
  2. Is this strategy applied with other dependencies as well? I'm asking this to understand if including the dependency in our side as a solution will scale or we may have to add others as well (so far adding only grpc-api was enough).

(*) BTW where can I find which is the right version of this lib for a specific version of opentelemetry-sdk, e.g. 1.19.0

jack-berg commented 2 years ago

Is the otel library doing something different just because the ManagedChannel is in the classpath?

No. It was in the past, but as of now the upstream grpc implementation is only used if setChannel is called.

Is this strategy applied with other dependencies as well? I'm asking this to understand if including the dependency in our side as a solution will scale or we may have to add others as well (so far adding only grpc-api was enough).

Here is the set of all the examples where logic is dependent on the classpath:

Not that besides the setChanel(ManagedChannel) these are all internal implementation details not exposed in public APIs.

(*) BTW where can I find which is the right version of this lib for a specific version of opentelemetry-sdk, e.g. 1.19.0

Version doesn't matter since the code isn't actually being used, but we track all our dependency versions here. Current version of grpc is 1.50.2.

gabrielgiussi commented 2 years ago

Thanks for the detailed response 👍

dovdembin commented 1 year ago

I'm also getting "java.lang.ClassNotFoundException: io.grpc.ManagedChannel", I am trying to create a "jenkins" shared library but even that I am using "OtlpGrpcMetricExporter.builder().setEndpoint" still there is an exception that prevent me to run can we have a fix or a way to run?