timja / jenkins-gh-issues-poc-06-18

0 stars 0 forks source link

[JENKINS-22525] "incompatible InnerClasses attribute" error in IBM J9 VM #5047

Closed timja closed 10 years ago

timja commented 10 years ago

When starting Jenkins, it always happen that a Slave disappears. The node is simply vanished.
I have tried looking for specific errors in the log without success, even looking at all log levels. There is an error while unmarshalling an xml, but unsure if it is related. See the stack trace below in any case.

I have to note that only one disappears, not all.

The curious thing comes that if I click "Reload configuration from disk", the disappeared slave is loaded correctly this time (the configuration was in the xml there all along, and I'm not overwriting it).

Information about my instance:
Version: 1.547 (happened with 1.532 as well)
java.vm.info JRE 1.6.0 IBM J9 2.4 Windows Server 2008 R2 amd64-64 jvmwa6460sr15-20131016_170922 (JIT enabled, AOT enabled)>J9VM - 20131016_170922>JIT - r9_20130920_46510ifx2>GC - GA24_Java6_SR15_20131016_1337_B170922 (happened with an older version as well, tried updating. I'm aware that IBM Java is not officially supported).
Using tomcat 6.0.32, on a windows 2008 R2 Standard.

Possible related bug: https://issues.jenkins-ci.org/browse/JENKINS-21867
(Also on Windows 2008)

Error that might be related:
FINE hudson.util.RobustReflectionConverter addErrorInContext
Failed to load
java.lang.IncompatibleClassChangeError: incompatible InnerClasses attribute between "hudson.model.Queue$ItemList" and "hudson.model.Queue"
at java.lang.Class.getDeclaringClass(Class.java:781)
at sun.reflect.generics.reflectiveObjects.ParameterizedTypeImpl.(ParameterizedTypeImpl.java:46)
at sun.reflect.generics.reflectiveObjects.ParameterizedTypeImpl.make(ParameterizedTypeImpl.java:89)
at sun.reflect.generics.factory.CoreReflectionFactory.makeParameterizedType(CoreReflectionFactory.java:98)
at sun.reflect.generics.visitor.Reifier.visitClassTypeSignature(Reifier.java:134)
at sun.reflect.generics.tree.ClassTypeSignature.accept(ClassTypeSignature.java:43)
at sun.reflect.generics.repository.FieldRepository.getGenericType(FieldRepository.java:79)
at java.lang.reflect.Field.getGenericType(Field.java:236)
at com.thoughtworks.xstream.mapper.AnnotationMapper.processTypes(AnnotationMapper.java:207)
at com.thoughtworks.xstream.mapper.AnnotationMapper.processAnnotations(AnnotationMapper.java:165)
at com.thoughtworks.xstream.mapper.AnnotationMapper.defaultImplementationOf(AnnotationMapper.java:122)
at hudson.util.xstream.MapperDelegate.defaultImplementationOf(MapperDelegate.java:59)
at com.thoughtworks.xstream.mapper.MapperWrapper.defaultImplementationOf(MapperWrapper.java:46)
at com.thoughtworks.xstream.core.TreeUnmarshaller.convertAnother(TreeUnmarshaller.java:54)
at com.thoughtworks.xstream.core.TreeUnmarshaller.convertAnother(TreeUnmarshaller.java:50)
at com.thoughtworks.xstream.converters.collections.AbstractCollectionConverter.readItem(AbstractCollectionConverter.java:71)
at hudson.util.RobustCollectionConverter.populateCollection(RobustCollectionConverter.java:83)
at com.thoughtworks.xstream.converters.collections.CollectionConverter.unmarshal(CollectionConverter.java:61)
at hudson.util.RobustCollectionConverter.unmarshal(RobustCollectionConverter.java:74)
at hudson.slaves.NodeList$ConverterImpl.unmarshal(NodeList.java:162)
at hudson.util.XStream2$AssociatedConverterImpl.unmarshal(XStream2.java:338)
at com.thoughtworks.xstream.core.TreeUnmarshaller.convert(TreeUnmarshaller.java:72)
at com.thoughtworks.xstream.core.AbstractReferenceUnmarshaller.convert(AbstractReferenceUnmarshaller.java:65)
at com.thoughtworks.xstream.core.TreeUnmarshaller.convertAnother(TreeUnmarshaller.java:66)
at hudson.util.RobustReflectionConverter.unmarshalField(RobustReflectionConverter.java:328)
at hudson.util.RobustReflectionConverter.doUnmarshal(RobustReflectionConverter.java:274)
at hudson.util.RobustReflectionConverter.unmarshal(RobustReflectionConverter.java:221)
at com.thoughtworks.xstream.core.TreeUnmarshaller.convert(TreeUnmarshaller.java:72)
at com.thoughtworks.xstream.core.AbstractReferenceUnmarshaller.convert(AbstractReferenceUnmarshaller.java:65)
at com.thoughtworks.xstream.core.TreeUnmarshaller.convertAnother(TreeUnmarshaller.java:66)
at com.thoughtworks.xstream.core.TreeUnmarshaller.convertAnother(TreeUnmarshaller.java:50)
at com.thoughtworks.xstream.core.TreeUnmarshaller.start(TreeUnmarshaller.java:134)
at com.thoughtworks.xstream.core.AbstractTreeMarshallingStrategy.unmarshal(AbstractTreeMarshallingStrategy.java:32)
at com.thoughtworks.xstream.XStream.unmarshal(XStream.java:1061)
at hudson.util.XStream2.unmarshal(XStream2.java:109)
at com.thoughtworks.xstream.XStream.unmarshal(XStream.java:1045)
at hudson.XmlFile.unmarshal(XmlFile.java:163)
at jenkins.model.Jenkins$17.run(Jenkins.java:2545)
at org.jvnet.hudson.reactor.TaskGraphBuilder$TaskImpl.run(TaskGraphBuilder.java:169)
at org.jvnet.hudson.reactor.Reactor.runTask(Reactor.java:282)
at jenkins.model.Jenkins$7.runTask(Jenkins.java:899)
at org.jvnet.hudson.reactor.Reactor$2.run(Reactor.java:210)
at org.jvnet.hudson.reactor.Reactor$Node.run(Reactor.java:117)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:906)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:929)
at java.lang.Thread.run(Thread.java:761)


Originally reported by gabriel, imported from: "incompatible InnerClasses attribute" error in IBM J9 VM
  • assignee: kohsuke
  • status: Resolved
  • priority: Critical
  • resolution: Fixed
  • resolved: 2014-07-31T16:48:47+00:00
  • imported: 2022/01/10
timja commented 10 years ago

oleg_nenashev:

Most probably, the referenced log is unrelated.
Could you provide the slave's configuration?

timja commented 10 years ago

bluejaguar:

Was this an in-place update/upgrade that done recently from a previous version of Jenkins (or even hudson)? At least on RHEL with tomcat we have had to remove the old directory under webapps when dropping a new jenkins.war in place. Otherwise we can get these same symptoms (disappearing nodes) and see this same error with a slightly different stack trace immediately after a fresh in-place upgrade.

Also note there is a ~/.jenkins/cache directory where jar file versions may be. Don't suggest you play in there unless step above has been tried and then, if still failing, after you have stopped jenkins to rename/delete cache contents.

With further testing we can get this same error to occur with different versions of Jenkins. For an existing installation we are seeing this same issue when we update to Jenkins 1.562 ("incompatible InnerClasses attribute between "hudson.model.Queue$ItemList" and "hudson.model.Queue"). Whereas dropping in-place a lower Jenkins version such as 1.528 does not show this issue.

timja commented 10 years ago

highth:

I also meet same problems. install both 1.558 version and 1.554.1(LTS). create a slave node then restart web server the slave node lost.
Some time will encounter exception as this ticket show when create the slave node.

I finally install the 1.528 seems works well for me.

I also think that a defect in newer vesion

My isntall evn: RHEL+tomcat8

timja commented 10 years ago

danielbeck:

Is there an entry in Manage Jenkins » Manage Old Data for the logged exception?

timja commented 10 years ago

tang:

We first encounter this situation from about Oct/13

The jvm info of my installation:

JRE 1.6.0 IBM J9 2.4 Windows Server 2008 R2 amd64-64 jvmwa6460sr10fp1-20120202_101568 (JIT enabled, AOT enabled)>J9VM - 20120202_101568>JIT - r9_20111107_21307ifx1>GC - 20120202_AA

timja commented 10 years ago

tang:

I changed the JRE from IBM's j9 to Oracle's one (jdk1.7.0_55 x64), restart, restart, restart, none of my slaves have been taken!

timja commented 10 years ago

gabriel:

Hi. Sorry for the delay.
Oleg- The slave is nothing out of the ordinary, just a simple, Java Web Start slave. It is not happening with the "same" slave always, just one that gets disappeared. If I have 4, and jenkins is restarted, 3 appears. If I restart again, I'm left with 2. My feeling is that it depends on the order in the xml configuration.

John - No, it was a new installation. I recently updated to 1.554.1 LTS, and removed the webapps folder. The issue is still there. I'll try removing the cache directory, though. It is a Windows machine, so I'll see where that folder is.

Daniel- Yes, there is. It says: hudson.model.Hudson IncompatibleClassChangeError: incompatible InnerClasses attribute between "hudson.model.Queue$ItemList" and "hudson.model.Queue.". But I'm not too convinced that it is related, as Oleg said.

While troubleshooting the issue, I found out that the missing slave configuration was in the jenkins' config XML, and discovered that hitting "Reload configuration from disk" makes the slave to appear again. As a workaround, I created a Groovy post-init script (https://wiki.jenkins-ci.org/display/JENKINS/Post-initialization+script), that calls the Reloading function.. So after restarting and just after finishing the initialization process, my jenkins reloads the configuration again, making the slave to appear again. It is a dirty workaround, but has been working for several weeks.

My suspicion is that it is a problem when running Jenkins on IBM Java, maybe even specific with IBM java versions.

timja commented 10 years ago

oleg_nenashev:

There're other issues with classloading on IBM Java. This issue is likely caused by IBM's j9 like it has been mentioned by Pei-Tang and Gabriel.

I'll create a master-issue, which will reference all others

timja commented 10 years ago

martinkutter:

We're running Jenkins on IBM JDK on AIX.

I haven't met this issue with Jenkins-1.532.2-LTS - however, I've seen this issue with 1.565. Maybe this helps narrowing down the issue.

System details:

AIX 7.1
IBM J9 VM (build 2.6, JRE 1.7.0 AIX ppc-32 20140106_181350 (JIT enabled, AOT enabled)
Tomcat 1.0.50

timja commented 10 years ago

martinkutter:

In my case, reloading the configuration from disk also helps.

Before I reload the configuration from disk, Jenkins complains that there is Old data on disk.
It displays the following:

Type: hudson.model.Hudson
Name:
Error: IncompatibleClassChangeError: incompatible InnerClasses attribute between "hudson.model.Queue$ItemList" and "hudson.model.Queue", ConversionException: Could not call hudson.EnvVars.readObject() : Invalid reference 
---- Debugging information ---- 
reference : ../../../../../../../slave/nodeProperties/hudson.slaves.EnvironmentVariablesNodeProperty/envVars/tree-map/default/comparator 
------------------------------- 
message : Could not call hudson.EnvVars.readObject() 
cause-exception : com.thoughtworks.xstream.converters.ConversionException 
cause-message : Invalid reference 
class : hudson.EnvVars 
required-type : hudson.EnvVars 
converter-type : com.thoughtworks.xstream.converters.reflection.SerializableConverter 
path : /hudson/slaves/slave[2]/nodeProperties/hudson.slaves.EnvironmentVariablesNodeProperty/envVars/tree-map/default/comparator 
line number : 243 
-------------------------------, 
InstantiationError: null, MissingFieldException: No field 'hudson.tools.ToolLocationNodeProperty' found in class 'hudson.model.Hudson'
timja commented 10 years ago

danielbeck:

IBM JDK on AIX is probably one of the worst supported environments for Jenkins. Consider moving the master node to e.g. Linux with Oracle JDK 7, which should be fairly reliable.

timja commented 10 years ago

martinkutter:

The ConversionException: Could not call hudson.EnvVars.readObject() : Invalid reference above is probably unrelated: It looks like my config had invalid references. After cleaning them up, this message disappeared.

However, the IncompatibleClassChangeError: incompatible InnerClasses attribute between "hudson.model.Queue$ItemList" and "hudson.model.Queue" could be the reason for this issue.

I reproduced the issue with Jenkins 1.554.3 (LTS), 1.565 and 1.570 with the following combinations.

The issue does not occur on Jenkins-1.532.2 (LTS).

In my case, the "missing" slave is always the first slave in the config.xml, so I would assume that loading a Slave from disk (and creating a hudsom.model.Queue$ItemList) does not work the first time,
but works on subsequent tries.

I've also found that the same error is thrown when creating the first slave node on a fresh jenkins instance (on save / doCreateItem):

 
Stack trace (excerpt):

Caused by: java.lang.IncompatibleClassChangeError: incompatible InnerClasses attribute between "hudson.model.Queue$ItemList" and "hudson.model.Queue"
    at java.lang.Class.getDeclaringClass(Class.java:821)
    at sun.reflect.generics.reflectiveObjects.ParameterizedTypeImpl.(ParameterizedTypeImpl.java:52)
    at sun.reflect.generics.reflectiveObjects.ParameterizedTypeImpl.make(ParameterizedTypeImpl.java:95)
    at sun.reflect.generics.factory.CoreReflectionFactory.makeParameterizedType(CoreReflectionFactory.java:105)
    at sun.reflect.generics.visitor.Reifier.visitClassTypeSignature(Reifier.java:140)
    at sun.reflect.generics.tree.ClassTypeSignature.accept(ClassTypeSignature.java:49)
    at sun.reflect.generics.repository.FieldRepository.getGenericType(FieldRepository.java:85)
    at java.lang.reflect.Field.getGenericType(Field.java:237)
    at com.thoughtworks.xstream.mapper.AnnotationMapper.processTypes(AnnotationMapper.java:224)
    at com.thoughtworks.xstream.mapper.AnnotationMapper.processAnnotations(AnnotationMapper.java:182)
    at com.thoughtworks.xstream.mapper.AnnotationMapper.serializedClass(AnnotationMapper.java:129)
    at hudson.util.xstream.MapperDelegate.serializedClass(MapperDelegate.java:39)
    at com.thoughtworks.xstream.mapper.MapperWrapper.serializedClass(MapperWrapper.java:26)
    at com.thoughtworks.xstream.converters.collections.AbstractCollectionConverter.writeItem(AbstractCollectionConverter.java:62)
    at hudson.slaves.NodeList$ConverterImpl.marshal(NodeList.java:151)

The slave is saved nonetheless.

Subsequent slaves can be created without this error.

timja commented 10 years ago

martinkutter:

The error does not appear in Jenkins <= 1.535

Looks like it has been introduced in 1.536.

timja commented 10 years ago

martinkutter:

I opened a support request at IBM. I'll update the issue when appropriate.

timja commented 10 years ago

jglick:

The loss of slave definitions is just the most obvious symptom of the problem with Queue$ItemList, whatever that is.

timja commented 10 years ago

jglick:

I suspect the problem is related to @WithBridgeMethods or some other bytecode manipulation tricks done in Jenkins; the Eclipse Java compiler also is broken by Queue. It is TBD whether our reworked bytecode is actually invalid according to the JVM spec, or whether it is simply unusual and confuses tools which were not tested against this situation.

timja commented 10 years ago

jglick:

Easily reproduced from tests:

JAVA_HOME=/…/ibm-jdk-71 mvn -f test -Dtest=QueueTest\#testFlyweightTasksWithoutMasterExecutors surefire:test

(Though in this case it does not cause the test to fail.)

timja commented 10 years ago

jglick:

Also reproducible from

public class Main {
    public static void main(String[] args) throws Exception {
System.out.println(Class.forName("hudson.model.Queue$ItemList").getEnclosingClass());
    }
}

run against core/target/classes, if you have done mvn -f core -DskipTests clean package, but not if the last goal is merely compile. javap confirms that the bytecode of Queue.class is substantially different after the packaging step, presumably due to the bridge method injector running on scheduleMaintenance, though the InnerClasses attribute is not different. In both cases there is no mention of ItemList in InnerClasses, which would seem to violate the JVM spec since it is clearly used, for example as the type of the buildables field. There is a difference in how fields are named, for example

- 1: getfield      #2  // Field buildables:Lhudson/model/Queue$ItemList;
+ 1: getfield      #2  // Field hudson/model/Queue.buildables:Lhudson/model/Queue$ItemList;
timja commented 10 years ago

jglick:

Attached diff before; now attaching original files for completeness.

timja commented 10 years ago

kohsuke:

After looking further into javap source code, I discovered why the following difference in the output happens:

- 1: getfield      #2  // Field buildables:Lhudson/model/Queue$ItemList;
+ 1: getfield      #2  // Field hudson/model/Queue.buildables:Lhudson/model/Queue$ItemList;

When printing the content of constant pool for these instructions, javap attempts to simplify the reference within the same class by omitting the class name as follows:

int tag = cpInfo.getTag();
switch (tag) {
    case CONSTANT_Methodref:
    case CONSTANT_InterfaceMethodref:
    case CONSTANT_Fieldref:
// simplify references within this class
CPRefInfo ref = (CPRefInfo) cpInfo;
try {
    if (ref.class_index == classFile.this_class)
 cpInfo = classFile.constant_pool.get(ref.name_and_type_index);
} catch (ConstantPool.InvalidIndex e) {
    // ignore, for now
}
}
print(tagName(tag) + " " + stringValue(cpInfo));

This check compares the constant pool index, not the actual content. The fieldref constant pool remains unchanged at #283.#624 before/after transformation, but classFile.this_class that references the class name itself changes from #283 to #400.

The constant pool entry #400 is "Class #891", just like #283. IOW, javac produces Queue.class that contains two identical constant pool entries, which is suprising.

It is still unclear as to whether this difference in the class file triggers the "incompatible InnerClasses attribute" problem.

I think the next step is to modify class file to insert InnerClass attribute for Queue$ItemList, as the lack of this entry does seem to violate JVM spec. If that is the case, the question remains as to why J9VM was happy with the class before transformation.

timja commented 10 years ago

kohsuke:

Just for the sake of completeness I manually patched Queue.class after transformation to have its classFile.this_class index points to #283, not #400, and now J9 VM is happy.

Wow.

timja commented 10 years ago

kohsuke:

Relevant StackOverflow thread and BugParade entry, which indicates the fix to javac made it into JDK8 b92.

Looks like J9 VM is failing to handle this rare situation correctly. I think I'll work around this problem in bridge method injector to reshuffle constant pools and eliminate duplicates.

timja commented 10 years ago

jglick:

More useful bug link: https://bugs.openjdk.java.net/browse/JDK-5053846

timja commented 10 years ago

scm_issue_link:

Code changed in jenkins
User: Kohsuke Kawaguchi
Path:
changelog.html
core/pom.xml
pom.xml
http://jenkins-ci.org/commit/jenkins/4e23793f62aa5019226f2ba96f8063f01affeb92
Log:
[FIXED JENKINS-22525]

Integrated a bridge-method-injector that rebuilds the constant pool from
scratch.

timja commented 10 years ago

kohsuke:

Code change in bridge-method-injector

timja commented 10 years ago

jglick:

Also it seems like https://bugs.openjdk.java.net/browse/JDK-6746955 and https://bugs.openjdk.java.net/browse/JDK-8015927 might be the more relevant issues.

timja commented 10 years ago

kohsuke:

Changed the title to accurately capture the issue.

timja commented 10 years ago

kohsuke:

As for ItemList not in the InnerClasses attribute, I'm discovering with the following test case that if a nested/inner class is marked as private, it does not get recorded in the InnerClasses attribute:

public class App 
{
    private static class Foo {}
    private class Bar {}

    static class Alpha {}
    class Bravo {}

    public static void main( String[] args )
    {
System.out.println(Foo.class.getEnclosingClass());
System.out.println(Bar.class.getEnclosingClass());

System.out.println(Alpha.class.getEnclosingClass());
System.out.println(Bravo.class.getEnclosingClass());
    }
}
Classfile /files/kohsuke/ws/jenkins/investigations/JENKINS-22525/jenkins22525/target/classes/test/App.class
  Last modified Jul 31, 2014; size 1158 bytes
  MD5 checksum 264ed3611a578e1a4c8e2d4e0ea8b378
  Compiled from "App.java"
public class test.App
  SourceFile: "App.java"
  InnerClasses: 
       #11= #8 of #9; //Bravo=class test/App$Bravo of class test/App
       static #13= #7 of #9; //Alpha=class test/App$Alpha of class test/App
  RuntimeInvisibleAnnotations:
    0: #57()
  minor version: 0
  major version: 49
  flags: ACC_PUBLIC, ACC_SUPER

Classfile /files/kohsuke/ws/jenkins/investigations/JENKINS-22525/jenkins22525/target/classes/test/App$Foo.class
  Last modified Jul 31, 2014; size 303 bytes
  MD5 checksum 3f4b082e8d8c5cbf85d4afeb7807b253
  Compiled from "App.java"
class test.App$Foo
  SourceFile: "App.java"
  minor version: 0
  major version: 49
  flags: ACC_SUPER

Classfile /files/kohsuke/ws/jenkins/investigations/JENKINS-22525/jenkins22525/target/classes/test/App$Alpha.class
  Last modified Jul 31, 2014; size 309 bytes
  MD5 checksum df99610f5b28798c5f07f308b749df89
  Compiled from "App.java"
class test.App$Alpha
  SourceFile: "App.java"
  InnerClasses: 
       static #10= #2 of #16; //Alpha=class test/App$Alpha of class test/App
  minor version: 0
  major version: 49
  flags: ACC_SUPER

This seems to violate JVM spec of the InnerClass attribute. J9 VM accepts this without reporting an error. In this case I don't understand how JVM can correctly determine that App$Foo has an enclosing class. From the byte code, there's no way to distinguish this vs compiling the following class:

class App {}
class App$Foo {} // '$' is a legal character in a class name in Java
timja commented 10 years ago

dogfood:

Integrated in jenkins_main_trunk #3569
[FIXED JENKINS-22525] (Revision 4e23793f62aa5019226f2ba96f8063f01affeb92)

Result = SUCCESS
kohsuke : 4e23793f62aa5019226f2ba96f8063f01affeb92
Files :

timja commented 10 years ago

martinkutter:

The fix seems to work around the issue - I've been unable to reproduce the error with a build from sources with the fix applied.

Runtime: Windows 7, IBM J9 VM (build 2.6, JRE 1.7.0 Windows 7 amd64-64 Compressed References 20130617_152572 (JIT enabled, AOT enabled)

timja commented 10 years ago

dantran:

dont see it at 1.575 http://jenkins-ci.org/changelog.

timja commented 10 years ago

jglick:

@dantran because it was committed toward 1.576.

timja commented 10 years ago

danielbeck:

Literally one click away: "Upcoming changes"

timja commented 10 years ago

dantran:

thanks, I am now running 1.576-SNAPSHOT. Good to see this issue is fixed

timja commented 10 years ago

martinkutter:

IBM has tracked the issue with "incompatible inner Class attribute" as APAR IV62298 - see http://www-01.ibm.com/support/docview.wss?uid=swg1IV62298
I've received an iFix for Java 7SR7 from IBM, which fixes the issue.

The fix will be applied in the upcoming IBM JDK 7 SR 8.

timja commented 10 years ago

scm_issue_link:

Code changed in jenkins
User: Kohsuke Kawaguchi
Path:
core/pom.xml
pom.xml
http://jenkins-ci.org/commit/jenkins/e4dc3b4210fb6afd80ddf138cafc6bfcb866df5f
Log:
[FIXED JENKINS-22525]

Integrated a bridge-method-injector that rebuilds the constant pool from
scratch.

(cherry picked from commit 4e23793f62aa5019226f2ba96f8063f01affeb92)

Conflicts:
changelog.html

timja commented 2 years ago

[Originally depends on: JENKINS-23176]