Closed benlamonica closed 7 years ago
I took a look at the source and I'm not sure this is the correct interpretation. m_graph
is synchronized in three places in this class and one of them is setStatus
. As far as I can tell, this method is always called outside of a synchronized block so that if thread A calls it while B is inside the synchronized block in setStatus
, this should not result in a deadlock.
Do you have the full thread dump?
Any way you can reproduce this on a small project?
I think you're right, it is just taking a very long time for it to complete the test, and the threads are blocked because another thread is busy inserting into the graph.
I looked at the log again and I see that it's only partially done. I'm not sure why it's so slow though, when I ran this in JUnit, the tests execute very quickly. (Full test runs in 1-2 hours, in testng, it's been running for 11+ hours now). 272000 out of 668592 tests run
So this is not a bug, feel free to close the issue. I'm sure it's something with my test.
Thanks for the quick response.
Thanks for the clarification.
Not sure I want to close this just yet, I'm not convinced yet that this is not a bug in TestNG, especially if you say that the same test runs fast in JUnit. Can you double check this? I'm a bit confused, since if the same tests are running in JUnit, then they are all independent (since JUnit doesn't support dependent tests) which means that the DynamicGraph use in TestNG should be minimal (this graph is only used when you have dependent test methods).
Can you report back on what you find? I want to make sure there's not some hidden slow algorithm in TestNG.
Let me start the jvm with a code profiler and run again.
Here are the particulars of the test:
I'll do a little more research and get back to you.
Here is the execution profiler.
Stack Trace Sample Count Percentage(%)
java.lang.Thread.run() 1,037 98.95
java.util.concurrent.ThreadPoolExecutor$Worker.run() 639 60.973
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 639 60.973
org.testng.internal.thread.graph.GraphThreadPoolExecutor.afterExecute(Runnable, Throwable) 406 38.74
org.testng.internal.thread.graph.GraphThreadPoolExecutor.setStatus(IWorker, DynamicGraph$Status) 406 38.74
org.testng.internal.DynamicGraph.setStatus(Object, DynamicGraph$Status) 398 37.977
java.util.HashSet.add(Object) 298 28.435
java.util.HashMap.put(Object, Object) 298 28.435
java.util.HashMap.putVal(int, Object, Object, boolean, boolean) 298 28.435
org.testng.internal.DynamicGraph.removeNode(Object) 100 9.542
java.util.HashSet.remove(Object) 100 9.542
java.util.ArrayList.remove(Object) 8 0.763
org.testng.internal.TestMethodWorker.run() 227 21.66
org.testng.internal.TestMethodWorker.invokeTestMethods(ITestNGMethod, Object, ITestContext) 227 21.66
org.testng.internal.Invoker.invokeTestMethods(ITestNGMethod, XmlSuite, Map, ConfigurationGroupMethods, Object, ITestContext) 227 21.66
org.testng.internal.Invoker.invokeTestMethod(Object, ITestNGMethod, Object[], int, XmlSuite, Map, ITestClass, ITestNGMethod[], ITestNGMethod[], ConfigurationGroupMethods, Invoker$FailureContext) 226 21.565
org.testng.internal.Invoker.invokeMethod(Object, ITestNGMethod, Object[], int, XmlSuite, Map, ITestClass, ITestNGMethod[], ITestNGMethod[], ConfigurationGroupMethods, Invoker$FailureContext) 226 21.565
org.testng.internal.MethodInvocationHelper.invokeMethod(Method, Object, Object[]) 224 21.374
java.lang.reflect.Method.invoke(Object, Object[]) 224 21.374
sun.reflect.DelegatingMethodAccessorImpl.invoke(Object, Object[]) 224 21.374
sun.reflect.GeneratedMethodAccessor135.invoke(Object, Object[]) 224 21.374
com.baml.gfx.pte.regression.PTERegression.db() 224 21.374
com.baml.gfx.pte.regression.PTERegression.areIdenticalNodes(String, String, DifferenceListener, ElementQualifier) 224 21.374
org.custommonkey.xmlunit.DetailedDiff.<init>(Diff) 94 8.969
org.custommonkey.xmlunit.Diff.<init>(String, String) 84 8.015
org.custommonkey.xmlunit.Diff.similar() 46 4.389
org.testng.internal.Invoker.invokeConfigurations(IClass, ITestNGMethod, ITestNGMethod[], XmlSuite, Map, Object[], Object, ITestResult) 1 0.095
org.testng.internal.MethodHelper.findExpectedExceptions(IAnnotationFinder, Method) 1 0.095
org.testng.internal.Invoker.filterMethods(IClass, ITestNGMethod[], Invoker$Predicate) 1 0.095
21.66% is spent on executing the test, and 61% is spent inserting into the graph set. (the rest is spent on our caching implementation, which suffered a disconnect to our dev environment and is busy rebuilding it's caches).
Do you know why we would be inserting into that graph?
It looks like the hashCode is based on the method, which would be the same for 100,000 entries. So each time it adds to the HashSet, it has to iterate over 100,000 instances in order to find that the object is not in the set? I will experiment with making the hashCode more unique...
Can you turning off parallelism completely? I'm curious what this will change.
I turned off parallelism and now it's running about 1000 tests every 15 seconds. I don't see any of the delays I was seeing before.
Doesn't sound like deadlock so much as severe lock contention.
On Tue, Aug 4, 2015 at 5:04 PM, Ben La Monica notifications@github.com wrote:
I turned off parallelism and now it's running about 1000 tests every 15 seconds. I don't see any of the delays I was seeing before.
— Reply to this email directly or view it on GitHub https://github.com/cbeust/testng/issues/772#issuecomment-127800321.
I turned off parallelism and now it's running about 1000 tests every 15 seconds. I don't see any of the delays I was seeing before.
That was my intuition, glad to see it confirmed. Like @rschmitt is saying, it's most likely a contention problem.
Here is a simplified example that illustrates the problem.
import static org.junit.Assert.assertNotNull;
import java.util.concurrent.atomic.AtomicInteger;
import org.testng.annotations.AfterMethod;
import org.testng.annotations.DataProvider;
import org.testng.annotations.Factory;
import org.testng.annotations.Test;
public class ContentionTest {
private Integer testData;
@Factory(dataProvider="data")
public ContentionTest(Integer data) {
testData = data;
}
@DataProvider(name="data", parallel=true)
public static Object[][] getData() {
Integer[][] data = new Integer[600000][1];
for(int i = 0; i < 600000; i++) {
data[i] = new Integer[] { i };
}
return data;
}
private static AtomicInteger testCount = new AtomicInteger(0);
@AfterMethod
public void countTests() {
int count = testCount.incrementAndGet();
if (count % 1000 == 0) {
System.out.println(count + " tests completed.");
}
}
@Test
public void one() {
assertNotNull("obj", testData);
}
@Test
public void two() {
assertNotNull("obj", testData);
}
@Test
public void three() {
assertNotNull("obj", testData);
}
@Test
public void four() {
assertNotNull("obj", testData);
}
@Test
public void five() {
assertNotNull("obj", testData);
}
@Test
public void six() {
assertNotNull("obj", testData);
}
}
And the suite:
<suite name="contention suite">
<test name="contention tests" parallel="methods" thread-count="6">
<classes>
<class name="ContentionTest"/>
</classes>
</test>
</suite>
Unfortunately I don't have any more time to devote to this issue. I'm going to have to go and find a different solution. Let me know if I can contribute any more info to you. Thanks for the great testing framework!
Thanks for the PR and your efforts, @benlamonica, I'll take it from there!
@cbeust, @juherr the similar fix of #775 has been broken down into #848, #849, #850, #852. Each one of them improves different path of the code path that makes #772 to take such a long time to execute. If you run #772 tests on the code before and after the fix, you will see that the time it takes to execute the tests are significantly reduced.
@testn The problem is we have no proof it or any idea how much we can. Then, it will take time to reproduce and test it on our side. Last time I tried, I had to stop before the end.
The hardest and most important thing with perf issues are not the fix, but the way to reproduce and to measure it easily.
But maybe you can help us?
@juherr, just try it with a lower number. In my case, 6000 is already big enough to see the problem. If you want to start validating issues, you probably need to create a lot of test cases for performance testing in which I don't see it much in TestNG.
@benlamonica , can you help provide the result of the test?
v 6.9.13.3 In testNG no mechanisms for forced termination of the child threads. Because of this, the execution of tests in a loop (stability test) simply hangs in the case of deadlock.
Exception in thread "pool-309-thread-5" java.lang.ArrayIndexOutOfBoundsException: 49
at java.util.ArrayList.fastRemove(ArrayList.java:546)
at java.util.ArrayList.remove(ArrayList.java:529)
at org.testng.internal.thread.graph.GraphThreadPoolExecutor.setStatus(GraphThreadPoolExecutor.java:82)
at org.testng.internal.thread.graph.GraphThreadPoolExecutor.afterExecute(GraphThreadPoolExecutor.java:94)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1150)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
@shaburov Do you have a way to reproduce it every time?
@juherr Absolutely floating defect. But since my tests are performed continuously in the monitoring mode, the reproduced defect in the interval 2-24 hours.
But I can guarantee that it is reproduced with the following settings:
testNG.setSuiteThreadPoolSize (< equal to the number launched suites >);
XmlSuite.setThreadCount (<5 and more>);
The error is not difficult to find.
Collections.synchronizedList()
on https://github.com/cbeust/testng/blame/master/src/main/java/org/testng/internal/thread/graph/GraphThreadPoolExecutor.java#L29 is smarter IMO.
Is the fix resolving your issue?
You made the most difficult part of the job, why not proposing a pull request then? :)
@juherr Pull request https://github.com/cbeust/testng/pull/1305
While running a large test (600,000+ test methods) I encountered a deadlock situation.
All of the threads are blocked with this stack trace:
And the thread that is holding the mutex: