prestodb / presto

The official home of the Presto distributed SQL query engine for big data
http://prestodb.io
Apache License 2.0
16.06k stars 5.38k forks source link

TestBlockBuilder.testNewBlockBuilderLikeForLargeBlockBuilder is unreliable and OOMs #15653

Open aweisberg opened 3 years ago

aweisberg commented 3 years ago

It's disabled right now.

pool-100-thread-2
  at java.lang.OutOfMemoryError.<init>()V (OutOfMemoryError.java:48)
  at io.airlift.slice.Slices.allocate(I)Lio/airlift/slice/Slice; (Slices.java:89)
  at io.airlift.slice.Slices.ensureSize(Lio/airlift/slice/Slice;I)Lio/airlift/slice/Slice; (Slices.java:76)
  at io.airlift.slice.DynamicSliceOutput.writeBytes(Lio/airlift/slice/Slice;II)V (DynamicSliceOutput.java:152)
  at com.facebook.presto.common.block.VariableWidthBlockBuilder.writeBytes(Lio/airlift/slice/Slice;II)Lcom/facebook/presto/common/block/BlockBuilder; (VariableWidthBlockBuilder.java:239)
  at com.facebook.presto.common.type.AbstractVarcharType.writeSlice(Lcom/facebook/presto/common/block/BlockBuilder;Lio/airlift/slice/Slice;II)V (AbstractVarcharType.java:146)
  at com.facebook.presto.common.type.AbstractVarcharType.writeSlice(Lcom/facebook/presto/common/block/BlockBuilder;Lio/airlift/slice/Slice;)V (AbstractVarcharType.java:140)
  at com.facebook.presto.block.TestBlockBuilder.testNewBlockBuilderLikeForLargeBlockBuilder()V (TestBlockBuilder.java:129)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Ljava/lang/reflect/Method;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; (Native Method)
  at sun.reflect.NativeMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; (NativeMethodAccessorImpl.java:62)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; (DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; (Method.java:498)
  at org.testng.internal.MethodInvocationHelper.invokeMethod(Ljava/lang/reflect/Method;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; (MethodInvocationHelper.java:104)
  at org.testng.internal.Invoker.invokeMethod(Ljava/lang/Object;Lorg/testng/ITestNGMethod;[Ljava/lang/Object;ILorg/testng/xml/XmlSuite;Ljava/util/Map;Lorg/testng/ITestClass;[Lorg/testng/ITestNGMethod;[Lorg/testng/ITestNGMethod;Lorg/testng/internal/ConfigurationGroupMethods;Lorg/testng/internal/Invoker$FailureContext;)Lorg/testng/ITestResult; (Invoker.java:645)
  at org.testng.internal.Invoker.invokeTestMethod(Ljava/lang/Object;Lorg/testng/ITestNGMethod;[Ljava/lang/Object;ILorg/testng/xml/XmlSuite;Ljava/util/Map;Lorg/testng/ITestClass;[Lorg/testng/ITestNGMethod;[Lorg/testng/ITestNGMethod;Lorg/testng/internal/ConfigurationGroupMethods;Lorg/testng/internal/Invoker$FailureContext;)Lorg/testng/ITestResult; (Invoker.java:851)
  at org.testng.internal.Invoker.invokeTestMethods(Lorg/testng/ITestNGMethod;Lorg/testng/xml/XmlSuite;Ljava/util/Map;Lorg/testng/internal/ConfigurationGroupMethods;Ljava/lang/Object;Lorg/testng/ITestContext;)Ljava/util/List; (Invoker.java:1177)
  at org.testng.internal.TestMethodWorker.invokeTestMethods(Lorg/testng/ITestNGMethod;Ljava/lang/Object;Lorg/testng/ITestContext;)V (TestMethodWorker.java:129)
  at org.testng.internal.TestMethodWorker.run()V (TestMethodWorker.java:112)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V (ThreadPoolExecutor.java:1149)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run()V (ThreadPoolExecutor.java:624)
  at java.lang.Thread.run()V (Thread.java:748)
aweisberg commented 3 years ago

This is available to take from Bhavani if he doesn't get to it.

v-jizhang commented 3 years ago

@aweisberg , should this be closed? I tried to run the test suite but could not produce a repro: ./mvnw test -pl :presto-main -Dtest=com.facebook.presto.block.TestBlockBuilder

Running com.facebook.presto.block.TestBlockBuilder
Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 5.597 s - in com.facebook.presto.block.TestBlockBuilder

Also in #15737, the test was enabled again because "The only one that doesn't work is Cassanda".

aweisberg commented 3 years ago

It seemed like it was passing reliably, but I don't think it is. Yes it frequently passes. I think the entire test suite may have issues.

It just failed here for example https://github.com/prestodb/presto/pull/15774/checks?check_run_id=2035664515

v-jizhang commented 3 years ago

It's no longer an OOM issue. Looks like it's not a test suite issue and it relates to Surefire. The solution here doesn't seem to work because we are using Surefire 2.22.0

aweisberg commented 3 years ago

That is a generic error message printed when the forked process terminates unexpectedly. Anything can cause it, but it's usually OOM. My recollection is when this reproduces locally (run it in a loop) you can check the log file and it mentions the OOM.