Closed knowledgehacker closed 8 years ago
Did you try upgrading to Java 8? As a side note you should also consider upgrading to a more recent presto version. On Thu, Feb 25, 2016 at 10:14 PM ming lin notifications@github.com wrote:
Some Presto nodes crashed when we run some complicated sqls. I switched CMS gc collector to g1 collector, but it did not solve the problem. And I tuned survivor/eden ratio, it mitigated the problem, but I am not sure it solved the problem.
--- Here is the information a. Environment Presto 0.83 and running on HDP 2.2.4, JDK1.7u75, on Linux 2.6.32-504.el6.x86_64
b. Crash log # A fatal error has been detected by the Java Runtime Environment:
# SIGSEGV (0xb) at pc=0x00007ffe4921602f, pid=12794, tid=140729879525120
# JRE version: Java(TM) SE Runtime Environment (7.0_75-b13) (build 1.7.0_75-b13) Java VM: Java HotSpot(TM) 64-Bit Server VM (24.75-b04 mixed mode linux-amd64 compressed oops) Problematic frame: V [libjvm.so+0x83d02f] ParRootScanWithoutBarrierClosure::do_oop(oopDesc**)+0x4f
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
# If you would like to submit a bug report, please visit: http://bugreport.sun.com/bugreport/crash.jsp
#
--------------- T H R E A D ---------------
Current thread (0x00007ffe4401f000): GCTaskThread [stack: 0x00007ffe3a6a7000,0x00007ffe3a7a8000] [id=12796]
siginfo:si_signo=SIGSEGV: si_errno=0, si_code=1 (SEGV_MAPERR), si_addr=0x00000000000e6a59
...
Stack: [0x00007ffe3a6a7000,0x00007ffe3a7a8000], sp=0x00007ffe3a7a6270, free space=1020k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x83d02f] ParRootScanWithoutBarrierClosure::dooop(oopDesc* )+0x4f V [libjvm.so+0x51e9bf] frame::oops_compiled_argumentsdo(Symbol, bool, RegisterMap const, OopClosure)+0x27f V [libjvm.so+0x7edd83] nmethod::preserve_callee_argumentoops(frame, RegisterMap const, OopClosure_)+0x1c3 V [libjvm.so+0x51f1e3] frame::oops_dointernal(OopClosure, CodeBlobClosure, RegisterMap , bool)+0x143 V [libjvm.so+0x95507a] JavaThread::oopsdo(OopClosure, CodeBlobClosure _)+0x15a V [libjvm.so+0x956242] Threads::possibly_parallel_oopsdo(OopClosure, CodeBlobClosure _)+0x112 V [libjvm.so+0x8b1e59] SharedHeap::process_strongroots(bool, bool, SharedHeap::ScanningOption, OopClosure, CodeBlobClosure, OopsInGenClosure, bool)+0xa9 V [libjvm.so+0x56811c] GenCollectedHeap::gen_process_strongroots(int, bool, bool, bool, SharedHeap::ScanningOption, OopsInGenClosure, bool, OopsInGenClosure_)+0x7c V [libjvm.so+0x83a5df] ParNewGenTask::work(unsigned int)+0xef V [libjvm.so+0x9b76ef] GangWorker::loop()+0xcf V [libjvm.so+0x820b88] java_start(Thread*)+0x108
JavaThread 0x00007ffe4537a000 (nid = 13202) was being processed Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) v ~RuntimeStub::resolve_static_call J 18233 C2 com.facebook.presto.$gen.PageProcessor_1967.project_0(Lcom/facebook/presto/spi/ConnectorSession;Lcom/facebook/presto/spi/block/Block;ILcom/facebook/presto/spi/block/BlockBuilder;)V (908 bytes) @ 0x00007ffe3d76899c [0x00007ffe3d7686e0+0x2bc] j com.facebook.presto.$gen.PageProcessor_1967.process(Lcom/facebook/presto/spi/ConnectorSession;Lcom/facebook/presto/spi/Page;IILcom/facebook/presto/spi/PageBuilder;)I+222 J 7607 C2 com.facebook.presto.operator.FilterAndProjectOperator.getOutput()Lcom/facebook/presto/spi/Page; (129 bytes) @ 0x00007ffe3bf14b68 [0x00007ffe3bf14640+0x528] J 6659 C2 com.facebook.presto.operator.Driver.processInternal()Lcom/google/common/util/concurrent/ListenableFuture; (272 bytes) @ 0x00007ffe3bb96f84 [0x00007ffe3bb967c0+0x7c4] J 13660 C2 com.facebook.presto.operator.Driver.processFor(Lio/airlift/units/Duration;)Lcom/google/common/util/concurrent/ListenableFuture; (251 bytes) @ 0x00007ffe3cd795d8 [0x00007ffe3cd79160+0x478] J 16225 C2 com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(Lio/airlift/units/Duration;)Lcom/google/common/util/concurrent/ListenableFuture; (67 bytes) @ 0x00007ffe3d2b4b6c [0x00007ffe3d2b4ac0+0xac] J 16241 C2 com.facebook.presto.execution.TaskExecutor$PrioritizedSplitRunner.process()Lcom/google/common/util/concurrent/ListenableFuture; (136 bytes) @ 0x00007ffe3b39fb94 [0x00007ffe3b39f980+0x214] J 15556 C2 com.facebook.presto.execution.TaskExecutor$Runner.run()V (583 bytes) @ 0x00007ffe3d22735c [0x00007ffe3d226d20+0x63c] j java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V+95 j java.util.concurrent.ThreadPoolExecutor$Worker.run()V+5 j java.lang.Thread.run()V+11 v ~StubRoutines::call_stub
...
Heap par new generation total 1380160K, used 1298596K [0x00000001f6a00000, 0x0000000254390000, 0x0000000254390000) eden space 1226816K, 100% used [0x00000001f6a00000, 0x0000000241810000, 0x0000000241810000) from space 153344K, 46% used [0x0000000241810000, 0x0000000245e29370, 0x000000024add0000) to space 153344K, 79% used [0x000000024add0000, 0x000000025249ccb0, 0x0000000254390000) concurrent mark-sweep generation total 23632320K, used 13056651K [0x0000000254390000, 0x00000007f6a00000, 0x00000007f6a00000) concurrent-mark-sweep perm gen total 153600K, used 128447K [0x00000007f6a00000, 0x0000000800000000, 0x0000000800000000)
Card table byte_map: [0x00007ffdf0fb4000,0x00007ffdf4000000] byte_map_base: 0x00007ffdeffff000
— Reply to this email directly or view it on GitHub https://github.com/facebook/presto/issues/4650.
Thanks for your comment. We are planning to upgrade Presto, but current version will run for a short time, we need to fix it during this period. I can't tell exactly it is an internal bug in JVM or some native method in Presto causes the crash. I plan to try new Java if we can't solve the problem.
There are similar jvm bug reports that's why I recommended you to upgrade your Java version first. On Fri, Feb 26, 2016 at 7:33 AM ming lin notifications@github.com wrote:
Thanks for your comment. We are planning to upgrade Presto, but current version will run for a short time, we need to fix it during this period. I can't tell exactly it is an internal bug in JVM or some native method in Presto causes the crash. I plan to try new Java if we can't solve the problem.
— Reply to this email directly or view it on GitHub https://github.com/facebook/presto/issues/4650#issuecomment-189324347.
We saw JVM crashes before 8u40.
Thanks @nezihyigitbasi. I will try it. @electrum, is there possible some native methods in Presto causes the issue? It's strange that when the crash happened, it always stop at PageProcessor or CursorProcessor.
No, that crash is inside a GC thread, so it's unlikely to be native code in Presto.
@knowledgehacker, can you paste the crash logs when you switch to G1? BTW, there have been a lot of fixes to GC since Java 7, so I suggest you try with a newer version of Java.
@martint I tried jdk1.8.0 u65, and it still crashes, and we get the following error from server.log, maybe there is something wrong with the memory(a hardware issue). Any thoughts?
2016-02-25T10:16:55.002+0000 WARN http-worker-991 com.facebook.presto.server.ThrowableMapper Request failed for /v1/task/20160225_101652_00046_q93jg.1.4
java.lang.NullPointerException: null
at com.facebook.presto.execution.TaskExecutor$TaskHandle.access$1300(TaskExecutor.java:288) ~[presto-main-0.83-fw0.8.2.jar:0.83-fw0.8.2]
at com.facebook.presto.execution.TaskExecutor$PrioritizedSplitRunner.
This is probably a bug in Presto. The symptom is somewhat different when Presto crashes. For this time's crash, it is probably caused by a bug in Presto(NullPointerException). I didn't get a crash log since the time I got the one pasted. Any thoughts?
I see when a SqlTaskExecution is constructed:
if (!taskStateMachine.getState().isDone()) {
taskHandle = taskExecutor.addTask(taskId);
taskStateMachine.addStateChangeListener(new RemoveTaskHandleWhenDone(taskExecutor, taskHandle));
}
else {
taskHandle = null;
}
Then when enqueueing a split: public synchronized List<ListenableFuture<?>> enqueueSplits(TaskHandle taskHandle, boolean forceStart, List<? extends SplitRunner> taskSplits) { List<ListenableFuture<?>> finishedFutures = new ArrayList<>(taskSplits.size()); for (SplitRunner taskSplit : taskSplits) { PrioritizedSplitRunner prioritizedSplitRunner = new PrioritizedSplitRunner(taskHandle, taskSplit, ticker); ...
_While in PrioritizedSplitRunner's constructor: _
private PrioritizedSplitRunner(TaskHandle taskHandle, SplitRunner split, Ticker ticker)
{
this.taskHandle = taskHandle;
this.splitId = taskHandle.getNextSplitId();
So it seems that the taskHandle is a null pointer when it is referenced in constructor "PriopritizedSplitRunner".
Any comment on it?
The version you are using 0.83 was released on Nov 12, 2014. We have fixed tons of bugs in the last year and half, so I am not surprised that you seeing problems.
Please, upgrade to the latest release and try again. If you still have problems, please open a new issue.
Some Presto nodes crashed when we run some complicated sqls, and the crash causes the server to reboot. I switched CMS gc collector to g1 collector, but it did not solve the problem. And I tuned survivor/eden ratio, it mitigated the problem, but I am not sure it solved the problem.
--- Here is the information a. Environment Presto 0.83 and running on HDP 2.2.4, JDK1.7.0_75-b13, on Linux 2.6.32-504.el6.x86_64
b. Crash log ...
--------------- T H R E A D ---------------
Current thread (0x00007ffe4401f000): GCTaskThread [stack: 0x00007ffe3a6a7000,0x00007ffe3a7a8000] [id=12796]
siginfo:si_signo=SIGSEGV: si_errno=0, si_code=1 (SEGV_MAPERR), si_addr=0x00000000000e6a59
...
Stack: [0x00007ffe3a6a7000,0x00007ffe3a7a8000], sp=0x00007ffe3a7a6270, free space=1020k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x83d02f] ParRootScanWithoutBarrierClosure::dooop(oopDesc*)+0x4f V [libjvm.so+0x51e9bf] frame::oops_compiled_argumentsdo(Symbol, bool, RegisterMap const, OopClosure)+0x27f V [libjvm.so+0x7edd83] nmethod::preserve_callee_argumentoops(frame, RegisterMap const, OopClosure_)+0x1c3 V [libjvm.so+0x51f1e3] frame::oops_dointernal(OopClosure, CodeBlobClosure, RegisterMap, bool)+0x143 V [libjvm.so+0x95507a] JavaThread::oopsdo(OopClosure, CodeBlobClosure_)+0x15a V [libjvm.so+0x956242] Threads::possibly_parallel_oopsdo(OopClosure, CodeBlobClosure_)+0x112 V [libjvm.so+0x8b1e59] SharedHeap::process_strongroots(bool, bool, SharedHeap::ScanningOption, OopClosure, CodeBlobClosure, OopsInGenClosure, bool)+0xa9 V [libjvm.so+0x56811c] GenCollectedHeap::gen_process_strongroots(int, bool, bool, bool, SharedHeap::ScanningOption, OopsInGenClosure, bool, OopsInGenClosure_)+0x7c V [libjvm.so+0x83a5df] ParNewGenTask::work(unsigned int)+0xef V [libjvm.so+0x9b76ef] GangWorker::loop()+0xcf V [libjvm.so+0x820b88] java_start(Thread*)+0x108
JavaThread 0x00007ffe4537a000 (nid = 13202) was being processed Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) v ~RuntimeStub::resolve_static_call J 18233 C2 com.facebook.presto.$gen.PageProcessor_1967.project_0(Lcom/facebook/presto/spi/ConnectorSession;Lcom/facebook/presto/spi/block/Block;ILcom/facebook/presto/spi/block/BlockBuilder;)V (908 bytes) @ 0x00007ffe3d76899c [0x00007ffe3d7686e0+0x2bc] j com.facebook.presto.$gen.PageProcessor_1967.process(Lcom/facebook/presto/spi/ConnectorSession;Lcom/facebook/presto/spi/Page;IILcom/facebook/presto/spi/PageBuilder;)I+222 J 7607 C2 com.facebook.presto.operator.FilterAndProjectOperator.getOutput()Lcom/facebook/presto/spi/Page; (129 bytes) @ 0x00007ffe3bf14b68 [0x00007ffe3bf14640+0x528] J 6659 C2 com.facebook.presto.operator.Driver.processInternal()Lcom/google/common/util/concurrent/ListenableFuture; (272 bytes) @ 0x00007ffe3bb96f84 [0x00007ffe3bb967c0+0x7c4] J 13660 C2 com.facebook.presto.operator.Driver.processFor(Lio/airlift/units/Duration;)Lcom/google/common/util/concurrent/ListenableFuture; (251 bytes) @ 0x00007ffe3cd795d8 [0x00007ffe3cd79160+0x478] J 16225 C2 com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(Lio/airlift/units/Duration;)Lcom/google/common/util/concurrent/ListenableFuture; (67 bytes) @ 0x00007ffe3d2b4b6c [0x00007ffe3d2b4ac0+0xac] J 16241 C2 com.facebook.presto.execution.TaskExecutor$PrioritizedSplitRunner.process()Lcom/google/common/util/concurrent/ListenableFuture; (136 bytes) @ 0x00007ffe3b39fb94 [0x00007ffe3b39f980+0x214] J 15556 C2 com.facebook.presto.execution.TaskExecutor$Runner.run()V (583 bytes) @ 0x00007ffe3d22735c [0x00007ffe3d226d20+0x63c] j java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V+95 j java.util.concurrent.ThreadPoolExecutor$Worker.run()V+5 j java.lang.Thread.run()V+11 v ~StubRoutines::call_stub
...
Heap par new generation total 1380160K, used 1298596K [0x00000001f6a00000, 0x0000000254390000, 0x0000000254390000) eden space 1226816K, 100% used [0x00000001f6a00000, 0x0000000241810000, 0x0000000241810000) from space 153344K, 46% used [0x0000000241810000, 0x0000000245e29370, 0x000000024add0000) to space 153344K, 79% used [0x000000024add0000, 0x000000025249ccb0, 0x0000000254390000) concurrent mark-sweep generation total 23632320K, used 13056651K [0x0000000254390000, 0x00000007f6a00000, 0x00000007f6a00000) concurrent-mark-sweep perm gen total 153600K, used 128447K [0x00000007f6a00000, 0x0000000800000000, 0x0000000800000000)
Card table byte_map: [0x00007ffdf0fb4000,0x00007ffdf4000000] byte_map_base: 0x00007ffdeffff000