open-telemetry / opentelemetry-android

OpenTelemetry Tooling for Android
Apache License 2.0
148 stars 35 forks source link

Classloader fails to load OpenTelemetryRum if built in debug with desugaring on #295

Open bidetofevil opened 6 months ago

bidetofevil commented 6 months ago

Ran into an issue that manifests in an Android build environment, with a trigger that is in the semconv module, but the ultimate root cause is likely in the Android toolchain (my bet is D8/AGP). But I figure I'd start with an issue here first, then figure out how to best workaround this for this project and then fix the issue upstream

Basically, desugaring in debug (or more specifically, without R8 minification) causes issues in the modified classes whereby the non-desugared version of a static interface default method was being accessed, but isn't actually present in the dex file.

This issue manifests when a static AttributeKey in the module that is instantiated with a static interface default method is accessed (e.g. ResourceAttributes.SERVICE_NAME). For some reason, that reference is still trying to access the non-desugared version (whether you're running in an Android runtime version that needs desugaring or not), which doesn't exist in the dex.

If in our own module's code, we call the same method, even in the same static member instantiation context, the call works. So the issue is likely not with the modification of the desugared method, but rather in how the callsite was or wasn't modified. Or there's some dangling reference that applies in one case but not the other. Or some dumb race condition. Who knows.

I pushed a repro in a branch that uses an instrumentation test to cause the problem, but I can do this via a test app as well by turning on desugaring and building in debug.

This issue goes away when we bump the minSdkVersion to 24 as that means desguaring is not necessary for Java 8 APIs, but it will continue to be an issue if our min is 21.

We can discuss how we want to address this - or not. Release builds aren't affected, so this will mainly be a PITA for devs and no one else.

bidetofevil commented 6 months ago

I created a repro case here via an instrumentation test. I was able to do it outside the scope of this project by just referencing the semconv package as well in a basic app.

breedx-splk commented 6 months ago

So it's specifically this configuration:

marandaneto commented 6 months ago

https://developer.android.com/build/shrink-code

To make your app as small and fast as possible, you should optimize and minify your release build with isMinifyEnabled = true.

        // Enables code shrinking, obfuscation, and optimization for only
        // your project's release build type. Make sure to use a build
        // variant with `isDebuggable=false`.
        isMinifyEnabled = true

I think the issue is the combination of isMinifyEnabled=true and isDebuggable=false, its not a supported use case.

marandaneto commented 6 months ago

https://slackhq.github.io/keeper/ might solve your issue as well.

marandaneto commented 6 months ago

https://developer.android.com/build/multidex#keep just to force the missing classes in the first dex file, maybe it is the heuristics algo that finds out what has to be in the first dex file. You can also change the minSdkVersion only for your testing build type and flavor to avoid bumping the minSdkVersion for everyone.

breedx-splk commented 6 months ago

@bidetofevil Can you please provide the gradle commandline to run your repro case test? I'm not smart enough to figure this out on my own.

bidetofevil commented 6 months ago

For posterity, the issue seems to be missing dependencies on the POM file for SemConv. Not sure why it would be different for release, so I want to clarify that before entering an issue for the repo to update their dependency.

breedx-splk commented 6 months ago

For posterity, the issue seems to be missing dependencies on the POM file for SemConv. Not sure why it would be different for release, so I want to clarify that before entering an issue for the repo to update their dependency.

I suspect it's because the semconv has compileOnly("io.opentelemetry:opentelemetry-api"), and it expects users of the semconv to be bringing in their own core api....which I am surprised that you're not getting through the agent or the sdk somehow.

bidetofevil commented 6 months ago

I'm pretty sure I tried adding all the things and the failure still occurs. I'll try it again and update the repro.

I suspect that desugaring is relying on the POM rather than the app's classpath in order to grab the dependencies it needs to go over. I'll confirm this and update the issue.

bidetofevil commented 6 months ago

I was eventually pointed to this AGP issue, which points to a workaround that works for AGP 8.3+. To close the loop, I will update the read me for this project and do a similar PR for the Java semconv project as discussed on Slack.