Open z11373 opened 5 years ago
I tracked down the issue, it is actually segfault and causing the storm worker died. Here is what I found from /var/log/messages:
[165495.820435] streamparse_run[9133]: segfault at 0 ip (null) sp 00007fff94220478 error 14 in bbpy2.7[400000+11ff000]
Still, I need help in troubleshooting this issue, so any help is really appreciated. Thanks!
Hi, sorry to write something here, but I wonder if anybody has suggestion for me on how to troubleshoot and figure out the culprit of the worker crash problem we have right now. We are using streamparse for our Python code on Storm 1.1.1 Below is the log that I caught before it got recycled due to crash. I am running out ideas on how to troubleshoot it, I really appreciate if anyone has idea or pointer. Thanks!
2019-08-28 15:05:32.947 o.a.s.s.ShellSpout Thread-11-event_spout-executor[10 10] [INFO] Launched subprocess with pid 10054 2019-08-28 15:05:32.951 o.a.s.d.executor Thread-11-event_spout-executor[10 10] [INFO] Opened spout event_spout:(10) 2019-08-28 15:05:32.953 o.a.s.d.executor Thread-11-event_spout-executor[10 10] [INFO] Activating spout event_spout:(10) 2019-08-28 15:05:32.953 o.a.s.s.ShellSpout Thread-11-event_spout-executor[10 10] [INFO] Start checking heartbeat... 2019-08-28 15:05:32.961 o.a.s.util Thread-11-event_spout-executor[10 10] [ERROR] Async loop died! java.lang.RuntimeException: pid:10054, name:event_spout exitCode:-1, errorString: at org.apache.storm.spout.ShellSpout.querySubprocess(ShellSpout.java:218) ~[storm-core-1.1.1.jar:1.1.1] at org.apache.storm.spout.ShellSpout.sendSyncCommand(ShellSpout.java:145) ~[storm-core-1.1.1.jar:1.1.1] at org.apache.storm.spout.ShellSpout.activate(ShellSpout.java:266) ~[storm-core-1.1.1.jar:1.1.1] at org.apache.storm.daemon.executor$fn4962$fn4977$fn__5008.invoke(executor.clj:641) ~[storm-core-1.1.1.jar:1.1.1] at org.apache.storm.util$async_loop$fn__557.invoke(util.clj:484) [storm-core-1.1.1.jar:1.1.1] at clojure.lang.AFn.run(AFn.java:22) [clojure-1.7.0.jar:?] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131] Caused by: java.lang.RuntimeException: org.apache.storm.multilang.NoOutputException: Pipe to subprocess seems to be broken! No output read. Serializer Exception:
2019-08-28 15:05:32.968 o.a.s.d.executor Thread-11-event_spout-executor[10 10] [ERROR] java.lang.RuntimeException: pid:10054, name:event_spout exitCode:-1, errorString: at org.apache.storm.spout.ShellSpout.querySubprocess(ShellSpout.java:218) ~[storm-core-1.1.1.jar:1.1.1] at org.apache.storm.spout.ShellSpout.sendSyncCommand(ShellSpout.java:145) ~[storm-core-1.1.1.jar:1.1.1] at org.apache.storm.spout.ShellSpout.activate(ShellSpout.java:266) ~[storm-core-1.1.1.jar:1.1.1] at org.apache.storm.daemon.executor$fn4962$fn4977$fn__5008.invoke(executor.clj:641) ~[storm-core-1.1.1.jar:1.1.1] at org.apache.storm.util$async_loop$fn__557.invoke(util.clj:484) [storm-core-1.1.1.jar:1.1.1] at clojure.lang.AFn.run(AFn.java:22) [clojure-1.7.0.jar:?] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131] Caused by: java.lang.RuntimeException: org.apache.storm.multilang.NoOutputException: Pipe to subprocess seems to be broken! No output read. Serializer Exception:
2019-08-28 15:05:33.009 o.a.s.util Thread-11-event_spout-executor[10 10] [ERROR] Halting process: ("Worker died") java.lang.RuntimeException: ("Worker died") at org.apache.storm.util$exit_processBANG.doInvoke(util.clj:341) [storm-core-1.1.1.jar:1.1.1] at clojure.lang.RestFn.invoke(RestFn.java:423) [clojure-1.7.0.jar:?] at org.apache.storm.daemon.worker$fn5632$fn5633.invoke(worker.clj:763) [storm-core-1.1.1.jar:1.1.1] at org.apache.storm.daemon.executor$mk_executor_data$fn4848$fn4849.invoke(executor.clj:276) [storm-core-1.1.1.jar:1.1.1] at org.apache.storm.util$async_loop$fn__557.invoke(util.clj:494) [storm-core-1.1.1.jar:1.1.1] at clojure.lang.AFn.run(AFn.java:22) [clojure-1.7.0.jar:?] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131] 2019-08-28 15:05:33.018 o.a.s.d.worker Thread-16 [INFO] Shutting down worker tmon-4-1567019114 ba5b3695-b390-4c3e-9d92-af0771f17b86 6700