Closed scottmarlow closed 5 years ago
Here is how I am starting the java agent + tests:
java -Xmx4096m -Xms128m -XX:+AggressiveOpts -XX:+UseBiasedLocking -XX:+UseFastAccessorMethods -Djava.endorsed.dirs=$TCK_HOME/endorsed \ -classpath $TCK_HOME/lib/javatest.jar:$TCK_HOME/lib/jtlegacy.jar:$TCK_HOME/classes:$TCK_HOME/endorsed/jaxb-impl.jar:$TCK_HOME/classes:$TCK_HOME/client/jaxb-jxc.jar:$TCK_HOME/client/jaxb-xjc.jar:$TCK_HOME/endorsed/jboss-jaxb-api_2.3_spec.jar:$TCK_HOME/endorsed/relaxngDatatype.jar:$TCK_HOME/endorsed/istack-commons-runtime.jar:$TCK_HOME/endorsed/istack-commons-tools.jar:$TCK_HOME/endorsed/txw2.jar:$TCK_HOME/client/codemodel.jar:$TCK_HOME/client/xsom.jar:$TCK_HOME/client/rngom.jar \ -Djava.security.policy=$TCK_HOME/lib/tck.policy \ com.sun.javatest.agent.AgentMain \ -passive 1>agent.log 2>agent-err.log &
java -Xmx512m -Xms128m -Djava.endorsed.dirs=$TCK_HOME/endorsed -jar $TCK_HOME/lib/javatest.jar \ -verbose:stop,progress -testSuite $TCK_HOME \ -workdir -create $WORK_DIR/work_directory \ -config ./default_configuration.jti \ -concurrency 2 -timeoutFactor 5 \ -runtests
Any suggestions?
Seems that every call in is on a different thread, which is why we cache several classloaders until we run out of memory:
"CommandExecutor executeThread for command args: [-loadDirs, /home/smarlow/work/jakarta/runjaxbtck/JAXB-TCK-2.3/classes:/home/smarlow/work/jakarta/runjaxbtck/work_directory/classes/bind/binder, javasoft.sqe.tests.api.javax.xml.bind.JAXBValidationCheckerTest, -TestURL, file:/home/smarlow/work/jakarta/runjaxbtck/JAXB-TCK-2.3/tests/api/javax_xml/bind/Binder/Binder.xsd.html#Binder.xml, -xml, Binder.xml, -package, javasoft.sqe.tests.bind.binder, -schema, Binder.xsd]@3595" prio=3 tid=0x148 nid=NA runnable java.lang.Thread.State: RUNNABLE at com.sun.jaxb_tck.lib.DirsClassLoader.newInstance(DirsClassLoader.java:60)
- locked <0x28d> (a java.lang.Class) at com.sun.jaxb_tck.lib.ExecTestSameJVMCmd.run(ExecTestSameJVMCmd.java:62) at com.sun.javatest.agent.Agent$Task$CommandExecutor$2.run(Agent.java:899) at java.lang.Thread.run(Thread.java:748)
For my current test running, it might be a better choice to cache the classloader based on thread name, instead of thread instance, since the thread name never changes (it seems) but the test harness seems to create a new thread for each test.
I wonder if this is due to a test harness change.
Based on my previous comment, since the "DirsClassLoader.threads" cache contents is never reused, we may as well drop the "DirsClassLoader.threads" cache, to avoid the OOM. I'll try a code change for that.
So far, the change is looking good for my local test run (Pass: 2,563 Fail: 0 Error: 0 Not-Run: 22,063)...
Also, will share a suggestion from Brian Stansberry:
If for some reason the caching is actually needed, converting it to a ThreadLocal<WeakReference
> might work. If a strong ref cache is needed
If you haven't already looked, you can use the Jenkins CI as a reference. The most recent run (April. 23) shows 100% success, no failures, no skips: https://ci.eclipse.org/jakartaee-tck/job/jaxb-tck/job/master/24/testReport/ The console output is awfully long, but it may provide some reference data for you to compare against: https://ci.eclipse.org/jakartaee-tck/blue/rest/organizations/jenkins/pipelines/jaxb-tck/branches/master/runs/24/nodes/28/steps/31/log/?start=0
Thanks @edbratt! Thanks @ankitoracle! I verified yesterday that the leak was resolved with this change. If we need to bring caching back in, I'd be happy to help with that or any problems with today's build/run.
I did look through the ^ ci.eclipse.org jaxb-tck job and don't yet see all involved steps (e.g. didn't see where the com.sun.javatest.agent.AgentMain is started). Will close since the pr is merged, thanks again!
@ankitoracle is it okay if I start a https://ci.eclipse.org/jakartaee-tck/job/jaxb-tck/job/master/ run with this change? If that looks good, can I also do a https://ci.eclipse.org/jakartaee-tck/job/jaxb-tck-nightly-build-run run?
Reopening until we see a successful CI job run...
@scottmarlow You can trigger a run.We run the TCK in multi jvm mode without an agent in this jenkins job
The jaxb-tck job run was successful, it took 6 hours 32 minutes (previous runs were: 6 h 26min, 6 h 30min, 6h 23min), so performance seems fine to me. Locally, my Jenkins test run took 6 hours 25 minutes (using the settings mentioned earlier above). Will now close this issue.
I'm seeing OOM failures in agent process. I'll investigate further, wondering if we are leaking com.sun.jaxb_tck.lib.DirsClassLoader instances.
The first OOM that I see is: