nathanmarz / storm-deploy

One click deploy for Storm clusters on AWS
Other
516 stars 148 forks source link

Error in deployment on Amazon EC2 using image based on ubuntu-trusty-14.04-amd64-server #75

Open anindyam opened 9 years ago

anindyam commented 9 years ago

I had tried to deploy a cluster using the defaults i.e., 1 Nimbus, 1 Zookeeper and 2 Supervisor nodes.

The instances were started successfully (I could see from Amazon console) but there was some error w.r.t building storm.

Then I tried to stop the cluster using the command "lein deploy-storm --stop --name ..." and what I found is that it could only stop 3 instances out of the 4 launched. It failed to stop one supervisor instance.

The error that was displayed on the console is:

/bin/mkdir -p '/usr/local/share/java'\n /usr/bin/install -c -m 644 zmq.jar '/usr/local/share/java'\nmake[3]: Leaving directory /opt/local/zeromq/rfLHkBB/jzmq/src'\nmake[2]: Leaving directory/opt/local/zeromq/rfLHkBB/jzmq/src'\nmake[1]: Leaving directory /opt/local/zeromq/rfLHkBB/jzmq/src'\nMaking install in perf\nmake[1]: Entering directory/opt/local/zeromq/rfLHkBB/jzmq/perf'\nmake install-am\nmake[2]: Entering directory /opt/local/zeromq/rfLHkBB/jzmq/perf'\nmake[3]: Entering directory/opt/local/zeromq/rfLHkBB/jzmq/perf'\nmake[3]: Nothing to be done for install-exec-am'.\nmake[3]: Nothing to be done forinstall-data-am'.\nmake[3]: Leaving directory /opt/local/zeromq/rfLHkBB/jzmq/perf'\nmake[2]: Leaving directory/opt/local/zeromq/rfLHkBB/jzmq/perf'\nmake[1]: Leaving directory /opt/local/zeromq/rfLHkBB/jzmq/perf'\nmake[1]: Entering directory/opt/local/zeromq/rfLHkBB/jzmq'\nmake[2]: Entering directory /opt/local/zeromq/rfLHkBB/jzmq'\nmake[2]: Nothing to be done forinstall-exec-am'.\nmake[2]: Nothing to be done for install-data-am'.\nmake[2]: Leaving directory/opt/local/zeromq/rfLHkBB/jzmq'\nmake[1]: Leaving directory `/opt/local/zeromq/rfLHkBB/jzmq'\n...done\nDirectory /mnt/storm...\n...done\nclean up home...\n...done\nBuild storm...\nAlready up-to-date.\nbash: bin/build_release.sh: No such file or directory\nBuild storm failed\nlogout\n", :server "54.67.52.187"}] ERROR core - errors found [{:message "Unexpected exception: throw+: {:type :pallet/ssh-connection-failure, :message \"ssh-fail: server 54.183.168.178, port 22, user storm, group :zookeeper-albeado-storm-cluster\", :cause #}", :type :pallet/action-execution-error, :cause #<ExceptionInfo slingshot.ExceptionInfo: throw+: {:type :pallet/ssh-connection-failure, :message "ssh-fail: server 54.183.168.178, port 22, user storm, group :zookeeper-albeado-storm-cluster", :cause #}>}] ERROR core - errors found [{:message "Unexpected exception: throw+: {:type :pallet/ssh-connection-failure, :message \"ssh-fail: server 54.183.166.164, port 22, user storm, group :supervisor-albeado-storm-cluster\", :cause #}", :type :pallet/action-execution-error, :cause #<ExceptionInfo slingshot.ExceptionInfo: throw+: {:type :pallet/ssh-connection-failure, :message "ssh-fail: server 54.183.166.164, port 22, user storm, group :supervisor-albeado-storm-cluster", :cause #}>}] ERROR logging - Exception in thread "main" ERROR core - errors found [{:message "Unexpected exception: throw+: {:type :pallet/ssh-connection-failure, :message \"ssh-fail: server 54.183.167.245, port 22, user storm, group :nimbus-albeado-storm-cluster\", :cause #}", :type :pallet/action-execution-error, :cause #<ExceptionInfo slingshot.ExceptionInfo: throw+: {:type :pallet/ssh-connection-failure, :message "ssh-fail: server 54.183.167.245, port 22, user storm, group :nimbus-albeado-storm-cluster", :cause #}>}] ERROR logging - java.lang.RuntimeException: java.util.concurrent.ExecutionException: slingshot.ExceptionInfo: Error prevented completion of phase: Unexpected exception: throw+: {:type :pallet/ssh-connection-failure, :message "ssh-fail: server 54.183.168.178, port 22, user storm, group :zookeeper-albeado-storm-cluster", :cause #} (form-init5676622805172385725.clj:1) ERROR logging - at clojure.lang.Compiler.eval(Compiler.java:5440) ERROR logging - at clojure.lang.Compiler.eval(Compiler.java:5415) ERROR logging - at clojure.lang.Compiler.load(Compiler.java:5857) ERROR logging - at clojure.lang.Compiler.loadFile(Compiler.java:5820) ERROR logging - at clojure.main$load_script.invoke(main.clj:221) ERROR logging - at clojure.main$init_opt.invoke(main.clj:226) ERROR logging - at clojure.main$initialize.invoke(main.clj:254) ERROR logging - at clojure.main$null_opt.invoke(main.clj:279) ERROR logging - at clojure.main$main.doInvoke(main.clj:354) ERROR logging - at clojure.lang.RestFn.invoke(RestFn.java:422) ERROR logging - at clojure.lang.Var.invoke(Var.java:369) ERROR logging - at clojure.lang.AFn.applyToHelper(AFn.java:165) ERROR logging - at clojure.lang.Var.applyTo(Var.java:482) ERROR logging - at clojure.main.main(main.java:37) ERROR logging - Caused by: java.lang.RuntimeException: java.util.concurrent.ExecutionException: slingshot.ExceptionInfo: Error prevented completion of phase: Unexpected exception: throw+: {:type :pallet/ssh-connection-failure, :message "ssh-fail: server 54.183.168.178, port 22, user storm, group :zookeeper-albeado-storm-cluster", :cause #} ERROR logging - at clojure.lang.LazySeq.sval(LazySeq.java:47) ERROR logging - at clojure.lang.LazySeq.seq(LazySeq.java:56) ERROR logging - at clojure.lang.RT.seq(RT.java:450) ERROR logging - at clojure.core$seq.invoke(core.clj:122) ERROR logging - at clojure.core$dorun.invoke(core.clj:2450) ERROR logging - at clojure.core$doall.invoke(core.clj:2465) ERROR logging - at pallet.core$parallel_lift.invoke(core.clj:757) ERROR logging - at pallet.core$lift_phase$fn5430.invoke(core.clj:779) ERROR logging - at clojure.lang.ArrayChunk.reduce(ArrayChunk.java:60) ERROR logging - at clojure.core$r.invoke(core.clj:797) ERROR logging - at pallet.core$lift_phase.invoke(core.clj:781) ERROR logging - at pallet.core$lift_nodes$fn5433.invoke(core.clj:794) ERROR logging - at clojure.lang.ArrayChunk.reduce(ArrayChunk.java:58) ERROR logging - at clojure.core$r.invoke(core.clj:797) ERROR logging - at pallet.core$lift_nodes.invoke(core.clj:798) ERROR logging - at pallet.core$liftSTAR.invoke(core.clj:1310) ERROR logging - at pallet.core$convergeSTAR.invoke(core.clj:1335) ERROR logging - at pallet.core$converge.doInvoke(core.clj:1506) ERROR logging - at clojure.lang.RestFn.invoke(RestFn.java:440) ERROR logging - at backtype.storm.provision$convergeBANG.invoke(provision.clj:41) ERROR logging - at backtype.storm.provision$stopBANG.invoke(provision.clj:114) ERROR logging - at backtype.storm.provision$_main$fn8422.invoke(provision.clj:143) ERROR logging - at backtype.storm.provision$_main.doInvoke(provision.clj:130) ERROR logging - at clojure.lang.RestFn.invoke(RestFn.java:437) ERROR logging - at clojure.lang.Var.invoke(Var.java:373) ERROR logging - at user$eval5.invoke(form-init5676622805172385725.clj:1) ERROR logging - at clojure.lang.Compiler.eval(Compiler.java:5424) ERROR logging - ... 13 more ERROR logging - Caused by: java.util.concurrent.ExecutionException: slingshot.ExceptionInfo: Error prevented completion of phase: Unexpected exception: throw+: {:type :pallet/ssh-connection-failure, :message "ssh-fail: server 54.183.168.178, port 22, user storm, group :zookeeper-albeado-storm-cluster", :cause #} ERROR logging - at java.util.concurrent.FutureTask.report(FutureTask.java:122) ERROR logging - at java.util.concurrent.FutureTask.get(FutureTask.java:188) ERROR logging - at clojure.core$future_call$reify5500.deref(core.clj:5399) ERROR logging - at clojure.core$deref.invoke(core.clj:1765) ERROR logging - at clojure.core$map$fn3695.invoke(core.clj:2096) ERROR logging - at clojure.lang.LazySeq.sval(LazySeq.java:42) ERROR logging - ... 39 more ERROR logging - Caused by: slingshot.ExceptionInfo: Error prevented completion of phase: Unexpected exception: throw+: {:type :pallet/ssh-connection-failure, :message "ssh-fail: server 54.183.168.178, port 22, user storm, group :zookeeper-albeado-storm-cluster", :cause #} ERROR logging - at pallet.core$raise_on_error$fn5204.invoke(core.clj:526) ERROR logging - at pallet.core$middleware_handler$fn5197.invoke(core.clj:495) ERROR logging - at pallet.core$apply_phase_to_node.invoke(core.clj:656) ERROR logging - at pallet.core$eval5367$fn5368$iter53695373$fn5374$fn5379.invoke(core.clj:721) ERROR logging - at clojure.lang.AFn.call(AFn.java:18) ERROR logging - at java.util.concurrent.FutureTask.run(FutureTask.java:262) ERROR logging - at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ERROR logging - at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) ERROR logging - at java.lang.Thread.run(Thread.java:744) ERROR logging - Caused by: slingshot.ExceptionInfo: throw+: {:type :pallet/ssh-connection-failure, :message "ssh-fail: server 54.183.168.178, port 22, user storm, group :zookeeper-albeado-storm-cluster", :cause #} ERROR logging - at pallet.execute$ensure_ssh_connection$fn4714.invoke(execute.clj:383) ERROR logging - at pallet.execute$ensure_ssh_connection.invoke(execute.clj:380) ERROR logging - at pallet.execute$ssh_bash_on_target.invoke(execute.clj:481) ERROR logging - at pallet.core$executor.invoke(core.clj:389) ERROR logging - at pallet.action_plan$execute_action.invoke(action_plan.clj:535) ERROR logging - at pallet.action_plan$execute$fn2363.invoke(action_plan.clj:554) ERROR logging - at clojure.core$r.invoke(core.clj:799) ERROR logging - at pallet.action_plan$execute.invoke(action_plan.clj:551) ERROR logging - at pallet.action_plan$execute_for_target.invoke(action_plan.clj:649) ERROR logging - at pallet.core$execute.invoke(core.clj:502) ERROR logging - at pallet.core$translate_action_plan$fn5192.invoke(core.clj:488) ERROR logging - at pallet.execute$ssh_user_credentials$fn4860.invoke(execute.clj:655) ERROR logging - at pallet.execute$execute_with_ssh$execute_with_ssh_fn4846.invoke(execute.clj:603) ERROR logging - at pallet.core$raise_on_error$fn5204.invoke(core.clj:519) ERROR logging - ... 8 more ERROR logging - Caused by: com.jcraft.jsch.JSchException: java.net.ConnectException: Connection timed out ERROR logging - at com.jcraft.jsch.Util.createSocket(Util.java:344) ERROR logging - at com.jcraft.jsch.Session.connect(Session.java:194) ERROR logging - at com.jcraft.jsch.Session.connect(Session.java:162) ERROR logging - at clj_ssh.ssh$connect.invoke(ssh.clj:300) ERROR logging - at pallet.execute$ensure_ssh_connection$fn__4714.invoke(execute.clj:381) ERROR logging - ... 21 more ERROR logging - Caused by: java.net.ConnectException: Connection timed out ERROR logging - at java.net.PlainSocketImpl.socketConnect(Native Method) ERROR logging - at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) ERROR logging - at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200) ERROR logging - at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) ERROR logging - at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) ERROR logging - at java.net.Socket.connect(Socket.java:579) ERROR logging - at java.net.Socket.connect(Socket.java:528) ERROR logging - at java.net.Socket.(Socket.java:425) ERROR logging - at java.net.Socket.(Socket.java:208) ERROR logging - at com.jcraft.jsch.Util.createSocket(Util.java:338) ERROR logging - ... 25 more

I don't know what is happening. Could someone please help?

BTW, the Java SDK I was using was JDK 1.7.045 64 bit.