raphw / byte-buddy

Runtime code generation for the Java virtual machine.
https://bytebuddy.net
Apache License 2.0
6.28k stars 807 forks source link

Child-first class loading cause deadlock on startup #632

Closed honganan closed 5 years ago

honganan commented 5 years ago

Hi, Sorry I am here to trouble you again. We caught thread deadlock when application start. The situation is very much like with your answer of this issue: https://github.com/stagemonitor/stagemonitor/issues/179

We have a child-first class loader to load any class including the application's main class. This loading process starts from application's maven package phrase. And when application init it starts multi thread to do other things. Child-first class loader, and async initialization matches with your description in above issue. Here is my deadlock detection log from jstack. What strange is the 'main' thread is waiting a int array lock, that confused us very.

Found one Java-level deadlock:
=============================
"pool-3-thread-1":
  waiting to lock monitor 0x00007f9ce0006168 (object 0x000000076ccd0000, a com.xxx.aladdin.container.loader.AladdinClassLoader),
  which is held by "main"
"main":
  waiting to lock monitor 0x00007f9ce0003f08 (object 0x000000076cd06008, a [I),
  which is held by "Apollo-RemoteConfigRepository-1"
"Apollo-RemoteConfigRepository-1":
  waiting to lock monitor 0x00007f9ce0006168 (object 0x000000076ccd0000, a com.xxx.aladdin.container.loader.AladdinClassLoader),
  which is held by "main"

Java stack information for the threads listed above:
===================================================
"pool-3-thread-1":
    at com.xxx.aladdin.extend.report.SkyNetReporter.run(SkyNetReporter.java:28)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
"main":
    at net.bytebuddy.pool.TypePool$Default.parse(TypePool.java:1034)
    at net.bytebuddy.pool.TypePool$Default.doDescribe(TypePool.java:1019)
    at net.bytebuddy.pool.TypePool$Default$WithLazyResolution.access$401(TypePool.java:1099)
    at net.bytebuddy.pool.TypePool$Default$WithLazyResolution.doResolve(TypePool.java:1174)
    at net.bytebuddy.pool.TypePool$Default$WithLazyResolution$LazyTypeDescription.delegate(TypePool.java:1257)
    at net.bytebuddy.description.type.TypeDescription$AbstractBase$OfSimpleType$WithDelegation.getModifiers(TypeDescription.java:7075)
    at net.bytebuddy.matcher.ModifierMatcher.matches(ModifierMatcher.java:31)
    at net.bytebuddy.matcher.ModifierMatcher.matches(ModifierMatcher.java:12)
    at net.bytebuddy.matcher.ElementMatcher$Junction$Disjunction.matches(ElementMatcher.java:137)
    at net.bytebuddy.agent.builder.AgentBuilder$RawMatcher$ForElementMatchers.matches(AgentBuilder.java:1210)
    at net.bytebuddy.agent.builder.AgentBuilder$Default$ExecutingTransformer.resolve(AgentBuilder.java:9337)
    at net.bytebuddy.agent.builder.AgentBuilder$Default$ExecutingTransformer.doTransform(AgentBuilder.java:9306)
    at net.bytebuddy.agent.builder.AgentBuilder$Default$ExecutingTransformer.transform(AgentBuilder.java:9269)
    at net.bytebuddy.agent.builder.AgentBuilder$Default$ExecutingTransformer.access$1300(AgentBuilder.java:9047)
    at net.bytebuddy.agent.builder.AgentBuilder$Default$ExecutingTransformer$LegacyVmDispatcher.run(AgentBuilder.java:9625)
    at net.bytebuddy.agent.builder.AgentBuilder$Default$ExecutingTransformer$LegacyVmDispatcher.run(AgentBuilder.java:9575)
    at java.security.AccessController.doPrivileged(Native Method)
    at net.bytebuddy.agent.builder.AgentBuilder$Default$ExecutingTransformer.transform(AgentBuilder.java:9194)
    at sun.instrument.TransformerManager.transform(TransformerManager.java:188)
    at sun.instrument.InstrumentationImpl.transform(InstrumentationImpl.java:428)
    at java.lang.ClassLoader.defineClass1(Native Method)
    at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
    at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
    at com.xxx.aladdin.container.loader.AbstractCommonClassLoader.defineClass(AbstractCommonClassLoader.java:228)
    at com.xxx.aladdin.container.loader.AbstractCommonClassLoader$1.run(AbstractCommonClassLoader.java:257)
    at com.xxx.aladdin.container.loader.AbstractCommonClassLoader$1.run(AbstractCommonClassLoader.java:251)
    at java.security.AccessController.doPrivileged(Native Method)
    at com.xxx.aladdin.container.loader.AbstractCommonClassLoader.findCustomClass(AbstractCommonClassLoader.java:250)
    at com.xxx.aladdin.container.loader.AbstractCommonClassLoader.loadCustomClass(AbstractCommonClassLoader.java:291)
    at com.xxx.aladdin.container.loader.AladdinClassLoader.loadClass(AladdinClassLoader.java:138)
    - locked <0x000000076ccd0000> (a com.xxx.aladdin.container.loader.AladdinClassLoader)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at com.xxx.aladdin.boot.loader.MainMethodRunner.run(MainMethodRunner.java:46)
    at com.xxx.aladdin.boot.loader.Launcher.launch(Launcher.java:83)
    at com.xxx.aladdin.boot.loader.Launcher.launch(Launcher.java:50)
    at com.xxx.aladdin.boot.loader.JarLauncher.main(JarLauncher.java:52)
"Apollo-RemoteConfigRepository-1":
    at net.bytebuddy.pool.TypePool$Default.parse(TypePool.java:1034)
    at net.bytebuddy.pool.TypePool$Default.doDescribe(TypePool.java:1019)
    at net.bytebuddy.pool.TypePool$Default$WithLazyResolution.access$401(TypePool.java:1099)
    at net.bytebuddy.pool.TypePool$Default$WithLazyResolution.doResolve(TypePool.java:1174)
    at net.bytebuddy.pool.TypePool$Default$WithLazyResolution$LazyTypeDescription.delegate(TypePool.java:1257)
    at net.bytebuddy.description.type.TypeDescription$AbstractBase$OfSimpleType$WithDelegation.getModifiers(TypeDescription.java:7075)
    at net.bytebuddy.matcher.ModifierMatcher.matches(ModifierMatcher.java:31)
    at net.bytebuddy.matcher.ModifierMatcher.matches(ModifierMatcher.java:12)
    at net.bytebuddy.matcher.ElementMatcher$Junction$Disjunction.matches(ElementMatcher.java:137)
    at net.bytebuddy.agent.builder.AgentBuilder$RawMatcher$ForElementMatchers.matches(AgentBuilder.java:1210)
    at net.bytebuddy.agent.builder.AgentBuilder$Default$ExecutingTransformer.resolve(AgentBuilder.java:9337)
    at net.bytebuddy.agent.builder.AgentBuilder$Default$ExecutingTransformer.doTransform(AgentBuilder.java:9306)
    at net.bytebuddy.agent.builder.AgentBuilder$Default$ExecutingTransformer.transform(AgentBuilder.java:9269)
    at net.bytebuddy.agent.builder.AgentBuilder$Default$ExecutingTransformer.access$1300(AgentBuilder.java:9047)
    at net.bytebuddy.agent.builder.AgentBuilder$Default$ExecutingTransformer$LegacyVmDispatcher.run(AgentBuilder.java:9625)
    at net.bytebuddy.agent.builder.AgentBuilder$Default$ExecutingTransformer$LegacyVmDispatcher.run(AgentBuilder.java:9575)
    at java.security.AccessController.doPrivileged(Native Method)
    at net.bytebuddy.agent.builder.AgentBuilder$Default$ExecutingTransformer.transform(AgentBuilder.java:9194)
    at sun.instrument.TransformerManager.transform(TransformerManager.java:188)
    at sun.instrument.InstrumentationImpl.transform(InstrumentationImpl.java:428)
    at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1852)
    - locked <0x000000076cd0c428> (a sun.net.www.protocol.http.HttpURLConnection)
    at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1492)
    - locked <0x000000076cd0c428> (a sun.net.www.protocol.http.HttpURLConnection)
    at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480)
    at com.ctrip.framework.apollo.util.http.HttpUtil.doGetWithSerializeFunction(HttpUtil.java:96)
    at com.ctrip.framework.apollo.util.http.HttpUtil.doGet(HttpUtil.java:49)
    at com.ctrip.framework.apollo.internals.RemoteConfigRepository.loadApolloConfig(RemoteConfigRepository.java:216)
    at com.ctrip.framework.apollo.internals.RemoteConfigRepository.sync(RemoteConfigRepository.java:131)
    - locked <0x000000076cd75378> (a com.ctrip.framework.apollo.internals.RemoteConfigRepository)
    at com.ctrip.framework.apollo.internals.AbstractConfigRepository.trySync(AbstractConfigRepository.java:22)
    at com.ctrip.framework.apollo.internals.RemoteConfigRepository$2.run(RemoteConfigRepository.java:315)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

Found 1 deadlock.
raphw commented 5 years ago

The problem is as described in the linked ticket. Thread A awaits loading a class from a loader that is currently loaded from thread B and vice versa. If you load Byte Buddy, you add to the classes being loaded such that this lock occurs.

To solve this, you need to exclude more classes from your agent or preload classes to not block the loading process.

honganan commented 5 years ago

I get it. Now I added a class loader ignore matcher when I new the AgentBuilder.Default like blow, In our situation, every class is loaded by our child first class loader. And if we limit only classes that loading by this class loader to be enhanced then this problem can be avoid right? This is how we install and set the agent:

ByteBuddyAgent.install();

PluginFinder pluginFinder = new PluginFinder(getPlugins());
new AgentBuilder.Default()
    .ignore(
        new MyDisjunction(
            new AgentBuilder.RawMatcher.ForElementMatchers(any(), new NullMatcher<>().or(not(new EqualityMatcher<>(loader)))),
            new AgentBuilder.RawMatcher.ForElementMatchers(nameStartsWith("net.bytebuddy.")
                .or(nameStartsWith("org.slf4j."))
                .or(nameStartsWith("ch.qos.logback."))
                .or(nameStartsWith("org.apache.logging."))
                .or(nameStartsWith("com.xxx.aladdin."))
                .or(nameStartsWith("org.groovy."))
                .or(nameContains("javassist"))
                .or(nameContains(".asm."))
                .or(allRontgenSnifferExcludeToolkit())
                .or(nameStartsWith("sun.reflect."))
                .or(ElementMatchers.<TypeDescription>isSynthetic())))
    )
    .type(pluginFinder.buildMatch())
    .transform(new Transformer(pluginFinder))
    .with(new Listener())
    .installOnByteBuddyAgent();
raphw commented 5 years ago

Yes, a lock happens if you load classes from a loader that needs to lock another loader to load a dependant class which is currently loaded by another thread that requires the first loader to load a dependant.

honganan commented 5 years ago

Thanks! Thank you for professional analysis! We are first time to use byte-buddy to manipulate byte code in our production environment. It's really a nice framework. I will study more about it.

raphw commented 5 years ago

Glad I could help, please reopen if there is more to do.

honganan commented 5 years ago

We still found DeadLock in production env after doing the above filter. In extremely case, A parent class loader load some class triggers AgentBuilder$Default$ExecutingTransformer#transform execute, which should load classes like JavaModule and 'java.lang.Module'. These happens before the ignoreMatcher works in AgentBuilder.Default.ExecutingTransformer#resolve. These will still probably cause dead lock in Child-first class loader situation.

Finally our solution is filter before enter the AgentBuilder$Default$ExecutingTransformer#transform to avoid loading any byte-buddy class by creating a proxy like transformer:

Instrumentation instrumentation = ByteBuddyAgent.install();

PluginFinder pluginFinder = new PluginFinder(getPlugins());
ResettableClassFileTransformer classFileTransformer = new AgentBuilder.Default()
    .ignore(
        new MyDisjunction(
            new AgentBuilder.RawMatcher.ForElementMatchers(any(), new NullMatcher<>().or(not(new EqualityMatcher<>(loader)))),
            new AgentBuilder.RawMatcher.ForElementMatchers(nameStartsWith("net.bytebuddy.")
                .or(nameStartsWith("org.slf4j."))
                .or(nameStartsWith("ch.qos.logback."))
                .or(nameStartsWith("org.apache.logging."))
                .or(nameStartsWith("org.groovy."))
                .or(nameContains("javassist"))
                .or(nameContains(".asm."))
                .or(nameStartsWith("sun.reflect."))
                .or(ElementMatchers.<TypeDescription>isSynthetic())))
    )
    .type(pluginFinder.buildMatch())
    .transform(new Transformer(pluginFinder))
    .with(new Listener())
    .installOnByteBuddyAgent();

boolean removeResult = instrumentation.removeTransformer(classFileTransformer);
if (!removeResult) {
    logger.error("apm classfiletransformer proxy failed");
    return;
}
instrumentation.addTransformer(new MyClassFileTransformer(classFileTransformer, loader), false);

and in MyClassFileTransformer we filter the class loader before call the byte-buddy's transformer:

@Override
public byte[] transform(ClassLoader loader, String className, Class<?> classBeingRedefined,
    ProtectionDomain protectionDomain, byte[] classfileBuffer) throws IllegalClassFormatException {

    try {
        if (targetClassLoader != loader) {
            return classfileBuffer;
        }
        String normalizedClassName = className.replaceAll("/", ".");
        if (normalizedClassName.startsWith("...")) {
            return classfileBuffer;
        }

        return classFileTransformer.transform(loader, className, classBeingRedefined, protectionDomain, classfileBuffer);
    } catch (Throwable t) {
        logger.warn("transform class {} exception: {}", className, t);
        return classfileBuffer;
    }
}

Then the deadlock not happened any more.

raphw commented 5 years ago

Glad you solved it. You can still use Byte Buddy's API by registering a TransformerDecorator that returns a delegating transformer.

honganan commented 5 years ago

Wow! This new fresh feature is exactly what we want. Nice! We are using version 1.7.9-RELEASE and We will upgrade to latest version to use this more graceful feature.

qsLI commented 9 months ago

Any idea about the int array lock?

To the best of my knowledge, the class of int[] type in JVM is built-in, and there should be no corresponding JVMTI event notification. However, from the call stack, it seems that bytebuddy has received the event of int array class loading.

Moreover, the location of this lock cannot be found from the output of jstack, it should be a C++ level lock. If it is a classloader lock, the output should not be [I, so it can only be an initialization lock.

We have also encountered cases of waiting for the int array lock in production, which may be the init lock of the int array.

image

image

honganan commented 9 months ago

@qsLI The problem decribed in this issue occurs only if you break the parent delegation when loading class.

qsLI commented 9 months ago

@qsLI The problem decribed in this issue occurs only if you break the parent delegation when loading class.

The [I lock is init lock

void java_lang_Class::initialize_mirror_fields(Klass* k,
                                               Handle mirror,
                                               Handle protection_domain,
                                               Handle classData,
                                               TRAPS) {
  // Allocate a simple java object for a lock.
  // This needs to be a java object because during class initialization
  // it can be held across a java call.
  typeArrayOop r = oopFactory::new_typeArray(T_INT, 0, CHECK);
  set_init_lock(mirror(), r);

  // Set protection domain also
  set_protection_domain(mirror(), protection_domain());

  // Initialize static fields
  InstanceKlass::cast(k)->do_local_static_fields(&initialize_static_field, mirror, CHECK);

 // Set classData
  set_class_data(mirror(), classData());
}