raphw / byte-buddy

Runtime code generation for the Java virtual machine.
https://bytebuddy.net
Apache License 2.0
6.23k stars 804 forks source link

Can ByteBuddy provide a list of the JDK classes it makes use of? #1666

Closed simonis closed 6 days ago

simonis commented 3 months ago

Hi @raphw,

We ran into ClassCircularityErrors due to ec39754 which introduced the usage of ConcurrentHashMap/ConcurrentHashMap::putIfAbsent() in the AgentBuilder class. ConcurrentHashMap::putIfAbsent() can transitively trigger a call to ConcurrentHashMap::fullAddCount() which in turn triggers the resolution of ThreadLocalRandom.

If a transforming agent is registered and ClassFileTransformer::transform() gets called for ThreadLocalRandom before it was resolved in ConcurrentHashMap and then ClassFileTransformer::transform() transitively calls ConcurrentHashMap::fullAddCount() through ByteBuddy, the resolution of ThreadLocalRandom will fail with a ClassCircularityError. This can happen even if the agent doesn't really want to transform ThreadLocalRandom, but only uses ByteBuddy APIs to exclude ThreadLocalRandom or other classes from transformation.

Notice that due to §5.4.3 "Resolution" of the Java Virtual Machine (JVM) Specification, this resolution failure will become a permanent failure on every future attempt to resolve the class for that specific constant pool entry:

If an attempt by the Java Virtual Machine to resolve a symbolic reference fails because an error is thrown that is an instance of LinkageError (or a subclass), then subsequent attempts to resolve the reference always fail with the same error that was thrown as a result of the initial resolution attempt.

This made it particularly hard to detect and attribute this error to the new ByteBuddy version which now uses ConcurrentHashMap, because:

  1. ConcurrentHashMap::fullAddCount() doesn't get called deterministically by ByteBuddy. This only happens when the ConcurrentHashMap used by ByteBuddy is accessed concurrently due to parallel class loading which is inherently non-deterministic.
  2. The ClassCircularityError during transform() is treated by default as if transform() returns null (i.e. "no transformation performed") but due to §5.4.3 of the JVMS the same ClassCircularityError can get thrown much later in unrelated user code which happens to use a ConcurrentHashMap concurrently.

Other libraries like Data Dogs dd-trace-java or Glowroot ran into the same issue and solved it by preloading ThreadLocalRandom before registering a ClassFileTransformer.

I'm aware of the discussions in #193: Explore options for warming up a class file transformer to avoid circularities, #859: How to exclude already loaded classes from transformation? and ByteBuddy's WarmupStrategy but to my understanding, this still requires the prior knowledge of which classes have to be preloaded.

So finally, my actual question is: do you think it is possible and useful that ByteBuddy provides a list of all the JDK/ASM classes it uses, or maybe even a feature which preloads all this classes before registering a transforming agent, in order to prevent such errors in the future?

Thank you and best regards, Volker

raphw commented 3 months ago

The idea of the warump strategy is that it executes the code of the ClassFileTransformer in each expected way. Ideally in form of a transformed class, an ignored class and a class that fails the transformation.

This would normally result in all relevant classes that are in use on a particular VM to execute transformation. This would then include the VM classes used by Byte Buddy and the user.

The problem with listing the classes would be that transitive use would not be addressed? If I loaded ConcurrentHashMap, ThreadLocalRandom would still be initiated late?

I might however make an internal prerun of the circularity lock which is also built to avoid these problems.

raphw commented 2 months ago

I hopefully avoid this in the circularity lock now which hopefully loads all relevant classes during construction from now on.

alexreicher commented 2 months ago

Hi @raphw ,

The attempt to “warm-up” ClassCircularityLock.Default will not solve the ClassCircularityError risk for Byte Buddy users because ConcurrentHashMap::putIfAbsent only loads ThreadLocalRandom if it detects thread contention.

A quick experiment (on Java 17) shows that ThreadLocalRandom is not loaded when the hash table is called with a single putIfAbsent.

import java.util.concurrent.ConcurrentHashMap;

public class Main {
    public static void main(String[] args) {
        ConcurrentHashMap<String, String> map = new ConcurrentHashMap<>();
        map.putIfAbsent("foo", "bar");
    }
}

We see ConcurrentHashMap and its inner classes loaded, but not ThreadLocalRandom:

> java -Xlog:class+load=info -cp out Main | grep ConcurrentHashMap
[0.020s][info][class,load] java.util.concurrent.ConcurrentHashMap source: shared objects file
[0.021s][info][class,load] java.util.concurrent.ConcurrentHashMap$Node source: shared objects file
[0.025s][info][class,load] java.util.concurrent.ConcurrentHashMap$Segment source: shared objects file
[0.025s][info][class,load] java.util.concurrent.ConcurrentHashMap$CounterCell source: shared objects file
[0.025s][info][class,load] java.util.concurrent.ConcurrentHashMap$ReservationNode source: shared objects file
> java -Xlog:class+load=info -cp out Main | grep ThreadLocalRandom
> 

However, when we use multiple threads:

import java.util.concurrent.ConcurrentHashMap;

public class Main {
    public static void main(String[] args) {
        ConcurrentHashMap<Integer, String> map = new ConcurrentHashMap<>();
        for (int k = 0; k < 100; k++) {
            new Thread(() -> {
                for (int i = 0; i < 10000; i++) {
                    map.putIfAbsent(i, "bar");
                }
            }).start();
        }
    }
}

Then we do see ThreadLocalRandom:

> java -Xlog:class+load=info -cp out Main | grep ThreadLocalRandom
[0.049s][info][class,load] java.util.concurrent.ThreadLocalRandom source: shared objects file

Note that merely looping 10000 times with just one thread didn't trigger the load of ThreadLocalRandom.

I believe that the implementation of class circularity prevention (e.g. CircularityLock) should be using a set of classes whose entire dependency closure can be reliably and cheaply preloaded (e.g. without the need to spin up multiple threads to trigger the load).

I explored the following approach:

  1. Pick a minimal set of JDK classes that could be used to implement a class circularity prevention mechanism.
  2. Preload all classes in the transitive dependency closure of those classes as part of initializing the mechanism implemented in step 1. Compute the closure recursively by scanning classes' constant pools for references to other classes.

The problem with this is that as of Java 17, I counted 4446 classes in the closure of java.lang.Object, including classes like com.sun.crypto.provider.SunJCE and java.util.regex.EmojiData. Preloading all of them might be problematic for applications with stringent initialization time requirements; it takes circa 500-1000 ms on my laptop, single-threaded. To reduce this cost, closure computation could be done at the method level, not just the class level.

For example, if class C has methods c1 and c2, CircularityLock uses C::c1 but not C::c2, and c2 depends on class D but not c1, then the naïve class-based dependency scanner would put D into the closure we need to preload, while a method-based scanner would not.

That being said, without a dependency scanner of some sort it's not possible for an instrumenting Java Agent to be truly safe from ClassCircularityError.

raphw commented 2 months ago

Indeed, I did not dig into the code, but obviously not all relevant methods will be resolved.

It is however difficult to anticipate the dependency graph statically as the JVM can always change an implementation even in minor updates, and Byte Buddy cannot analyze all VM versions during a build. It would be to unreliable and I am looking for a dynamic solution.

I attempted a new solution now that uses code that protects the circularity lock with an additional global lock around the locking mechanism method. This might add some overhead, but class loading in itself is already an expensive operation, so I would not believe that it is notable, and it is certainly preferred over crashing the VM.