pystorm / streamparse

Run Python in Apache Storm topologies. Pythonic API, CLI tooling, and a topology DSL.
http://streamparse.readthedocs.io/
Apache License 2.0
1.5k stars 218 forks source link

streamparse wordcount demo not working with storm 1.0.3 #362

Closed chenjingping closed 7 years ago

chenjingping commented 7 years ago

when i use "sparse run" command with storm version 1.0.3 the error show below:

6279 [refresh-active-timer] INFO  o.a.s.d.worker - All connections are ready for worker f3b554d9-ef09-4aee-85c7-ed96fb5ede3e:1027 with id 5795bc05-3366-492e-87b9-e67e5e7d3acd
6360 [Thread-18-count_bolt-executor[2 2]] INFO  o.a.s.d.executor - Preparing bolt count_bolt:(2)
6368 [Thread-18-count_bolt-executor[2 2]] INFO  o.a.s.u.ShellProcess - Storm multilang serializer: org.apache.storm.multilang.JsonSerializer
6370 [Thread-20-word_spout-executor[3 3]] INFO  o.a.s.d.executor - Opening spout word_spout:(3)
6374 [Thread-20-word_spout-executor[3 3]] INFO  o.a.s.u.ShellProcess - Storm multilang serializer: org.apache.storm.multilang.JsonSerializer
6377 [Thread-22-__acker-executor[1 1]] INFO  o.a.s.d.executor - Preparing bolt __acker:(1)
6379 [Thread-22-__acker-executor[1 1]] INFO  o.a.s.d.executor - Prepared bolt __acker:(1)
6384 [Thread-24-__system-executor[-1 -1]] INFO  o.a.s.d.executor - Preparing bolt __system:(-1)
6387 [Thread-24-__system-executor[-1 -1]] INFO  o.a.s.d.executor - Prepared bolt __system:(-1)
6560 [Thread-18-count_bolt-executor[2 2]] ERROR o.a.s.util - Async loop died!
java.lang.RuntimeException: org.apache.storm.multilang.NoOutputException: Pipe to subprocess seems to be broken! No output read.
Serializer Exception:
Traceback (most recent call last):
  File "/usr/local/python27/bin/streamparse_run", line 11, in <module>
    sys.exit(main())
  File "/usr/local/python27/lib/python2.7/site-packages/streamparse/run.py", line 34, in main
    mod = importlib.import_module(mod_name)
  File "/usr/local/python27/lib/python2.7/importlib/__init__.py", line 37, in import_module
    __import__(name)
ImportError: No module named bolts.wordcount

    at org.apache.storm.utils.ShellProcess.launch(ShellProcess.java:91) ~[storm-core-1.0.3.jar:1.0.3]
    at org.apache.storm.task.ShellBolt.prepare(ShellBolt.java:131) ~[storm-core-1.0.3.jar:1.0.3]
    at org.apache.storm.daemon.executor$fn__4973$fn__4986.invoke(executor.clj:791) ~[storm-core-1.0.3.jar:1.0.3]
    at org.apache.storm.util$async_loop$fn__557.invoke(util.clj:482) [storm-core-1.0.3.jar:1.0.3]
    at clojure.lang.AFn.run(AFn.java:22) [clojure-1.7.0.jar:?]
    at java.lang.Thread.run(Thread.java:745) [?:1.8.0_77]
6564 [Thread-18-count_bolt-executor[2 2]] ERROR o.a.s.d.executor - 
java.lang.RuntimeException: org.apache.storm.multilang.NoOutputException: Pipe to subprocess seems to be broken! No output read.
Serializer Exception:
Traceback (most recent call last):
  File "/usr/local/python27/bin/streamparse_run", line 11, in <module>
    sys.exit(main())
  File "/usr/local/python27/lib/python2.7/site-packages/streamparse/run.py", line 34, in main
    mod = importlib.import_module(mod_name)
  File "/usr/local/python27/lib/python2.7/importlib/__init__.py", line 37, in import_module
    __import__(name)
ImportError: No module named bolts.wordcount

    at org.apache.storm.utils.ShellProcess.launch(ShellProcess.java:91) ~[storm-core-1.0.3.jar:1.0.3]
    at org.apache.storm.task.ShellBolt.prepare(ShellBolt.java:131) ~[storm-core-1.0.3.jar:1.0.3]
    at org.apache.storm.daemon.executor$fn__4973$fn__4986.invoke(executor.clj:791) ~[storm-core-1.0.3.jar:1.0.3]
    at org.apache.storm.util$async_loop$fn__557.invoke(util.clj:482) [storm-core-1.0.3.jar:1.0.3]
    at clojure.lang.AFn.run(AFn.java:22) [clojure-1.7.0.jar:?]
    at java.lang.Thread.run(Thread.java:745) [?:1.8.0_77]
ubiteixeira commented 7 years ago

I'm also experiencing the same error

melderan commented 7 years ago

Ran into this problem as well. Inside the project.clj file that is created during the quickstart bootstrap, change the 1.0.2 to 1.0.3 to match your Storm version in the dependencies section. Not sure if you already did that, but give it a shot and let me know if that doesn't work.

ubiteixeira commented 7 years ago

I've already changed the dependencies.

chenjingping commented 7 years ago

I've already changed the project.clj file from "1.0.2" to "1.0.3", project.clj file content show below:

(defproject wordcount "0.0.1-SNAPSHOT"
  :resource-paths ["_resources"]
  :target-path "_build"
  :min-lein-version "2.0.0"
  :jvm-opts ["-client"]
  :dependencies  [[org.apache.storm/storm-core "1.0.3"]
                  [org.apache.storm/flux-core "1.0.3"]]
  :jar-exclusions     [#"log4j\.properties" #"org\.apache\.storm\.(?!flux)" #"trident" #"META-INF" #"meta-inf" #"\.yaml"]
  :uberjar-exclusions [#"log4j\.properties" #"org\.apache\.storm\.(?!flux)" #"trident" #"META-INF" #"meta-inf" #"\.yaml"]
  )
bobgrigoryan commented 7 years ago

as i remember - in unpacked topology storm 1.0.3 doesn't have symlink to resources folder where python files should be found.

might be caused by this change: https://github.com/apache/storm/pull/1898/files

leopeng1995 commented 7 years ago

I also found another interesting phenomenon: after I installed streamparse (in a new VM), I immediately launched Python console, typed import bolts, and it works. However, after I tried this wordcount example for Storm Version 1.0.3, I tried the command again. It shows No module named bolts and No module namd spouts.

jsyoungnet commented 7 years ago

Hi All,

At bobgrigoryan's suggestion I'll repost what I've found regarding this issue here. I'll also point out that the 'bug' seems to be in the java code (project.clj) that bob mentions above. As a python hack I fixed the problem with an ugly change to run.py. I just wanted to get a topology running to learn. IMHO the problem seems to be that 'resources' is inserted twice into the path that contains the tmp files of spouts and bolts:

From the stream parse forum:

I'm interested in learning the streamparse library and brought up an install on my mac laptop via:

mkvirtualenv storm-1.0.3
working storm-1.0.3

brew storm install
pip streamparse install

... etc.

created the wordcount topology

cd word count
sparse run

which would eventually fail with an error message (included below) that essentially said "I can't find the spouts or the bolts to create the toplogy"

so I poked around and after a bit of trial and error it seems that (on my machine anyway) that the directory from which run.py was executing was one level too high and that the spouts and bolts were buried another level deeper in the temporary tree...

In other words, run.py was executing with os.getcwd() equal to

/private/var/folders/_1/p0xw6sw92kn2g_mnh9ggl160000107/T/54658c95-597d-4fd5-a67d-9209e8e0ac31/supervisor/stormdist/wordcount-1-1490153387/resources/

but in order to see the bolts and spouts, the directory needed to be:

/private/var/folders/_1/p0xw6sw92kn2g_mnh9ggl160000107/T/54658c95-597d-4fd5-a67d-9209e8e0ac31/supervisor/stormdist/wordcount-1-1490153387/resources/resources/

I don't know if this error is reproducible and i haven't looked into why 'resources' is in the path twice, however, i fixed it temporarily with a change to run.py:

diff run.py run.old.py

31c31

<     sys.path.extend((os.getcwd(), os.getcwd()+'/resources'))

---

>     sys.path.append(os.getcwd())

which is ugly but at least I can now run topologies locally.

Posting here in case anyone else is running a local topology on a mac (probably a small set of folks)

jy

20197 [Thread-20-count_bolt-executor[1 1]] ERROR o.a.s.d.executor - 
java.lang.RuntimeException: org.apache.storm.multilang.NoOutputException: Pipe to subprocess seems to be broken! Currently read output: /private/var/folders/_1/p0xw6sw92kn2g_mnh9ggl160000107/T/54658c95-597d-4fd5-a67d-9209e8e0ac31/supervisor/stormdist/wordcount-1-1490153387/resources
Serializer Exception:
Traceback (most recent call last):
  File "/Users/jyoung/.virtualenvs/storm-1.0.3/bin/streamparse_run", line 11, in <module>
    sys.exit(main())
  File "/Users/jyoung/.virtualenvs/storm-1.0.3/lib/python2.7/site-packages/streamparse/run.py", line 35, in main
    mod = importlib.import_module(mod_name)
  File "/usr/local/Cellar/python/2.7.12_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/importlib/__init__.py", line 37, in import_module
    __import__(name)
ImportError: No module named bolts.wordcount

    at org.apache.storm.utils.ShellProcess.launch(ShellProcess.java:91) ~[storm-core-1.0.3.jar:1.0.3]
    at org.apache.storm.task.ShellBolt.prepare(ShellBolt.java:131) ~[storm-core-1.0.3.jar:1.0.3]
    at org.apache.storm.daemon.executor$fn__4973$fn__4986.invoke(executor.clj:791) ~[storm-core-1.0.3.jar:1.0.3]
    at org.apache.storm.util$async_loop$fn__557.invoke(util.clj:482) [storm-core-1.0.3.jar:1.0.3]
    at clojure.lang.AFn.run(AFn.java:22) [clojure-1.7.0.jar:?]
    at java.lang.Thread.run(Thread.java:745) [?:1.8.0_121]
20197 [Thread-18-word_spout-executor[2 2]] ERROR o.a.s.d.executor - 
java.lang.RuntimeException: org.apache.storm.multilang.NoOutputException: Pipe to subprocess seems to be broken! Currently read output: /private/var/folders/_1/p0xw6sw92kn2g_mnh9ggl160000107/T/54658c95-597d-4fd5-a67d-9209e8e0ac31/supervisor/stormdist/wordcount-1-1490153387/resources
Serializer Exception:
Traceback (most recent call last):
  File "/Users/jyoung/.virtualenvs/storm-1.0.3/bin/streamparse_run", line 11, in <module>
    sys.exit(main())
  File "/Users/jyoung/.virtualenvs/storm-1.0.3/lib/python2.7/site-packages/streamparse/run.py", line 35, in main
    mod = importlib.import_module(mod_name)
  File "/usr/local/Cellar/python/2.7.12_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/importlib/__init__.py", line 37, in import_module
    __import__(name)
ImportError: No module named spouts.words

    at org.apache.storm.utils.ShellProcess.launch(ShellProcess.java:91) ~[storm-core-1.0.3.jar:1.0.3]
    at org.apache.storm.spout.ShellSpout.open(ShellSpout.java:94) ~[storm-core-1.0.3.jar:1.0.3]
    at org.apache.storm.daemon.executor$fn__4905$fn__4920.invoke(executor.clj:600) ~[storm-core-1.0.3.jar:1.0.3]
    at org.apache.storm.util$async_loop$fn__557.invoke(util.clj:482) [storm-core-1.0.3.jar:1.0.3]
    at clojure.lang.AFn.run(AFn.java:22) [clojure-1.7.0.jar:?]
    at java.lang.Thread.run(Thread.java:745) [?:1.8.0_121]
20200 [ProcessThread(sid:0 cport:-1):] INFO  o.a.s.s.o.a.z.s.PrepRequestProcessor - Got user-level KeeperException when processing sessionid:0x15af4101fa90011 type:create cxid:0x40 zxid:0x43 txntype:-1 reqpath:n/a Error Path:/storm/errors/wordcount-1-1490153387 Error:KeeperErrorCode = NodeExists for /storm/errors/wordcount-1-1490153387
20206 [Thread-20-count_bolt-executor[1 1]] ERROR o.a.s.util - Halting process: ("Worker died")
java.lang.RuntimeException: ("Worker died")
    at org.apache.storm.util$exit_process_BANG_.doInvoke(util.clj:341) [storm-core-1.0.3.jar:1.0.3]
    at clojure.lang.RestFn.invoke(RestFn.java:423) [clojure-1.7.0.jar:?]
    at org.apache.storm.daemon.worker$fn__5571$fn__5572.invoke(worker.clj:759) [storm-core-1.0.3.jar:1.0.3]
    at org.apache.storm.daemon.executor$mk_executor_data$fn__4792$fn__4793.invoke(executor.clj:274) [storm-core-1.0.3.jar:1.0.3]
    at org.apache.storm.util$async_loop$fn__557.invoke(util.clj:494) [storm-core-1.0.3.jar:1.0.3]
    at clojure.lang.AFn.run(AFn.java:22) [clojure-1.7.0.jar:?]
    at java.lang.Thread.run(Thread.java:745) [?:1.8.0_121]
20206 [Thread-18-word_spout-executor[2 2]] ERROR o.a.s.util - Halting process: ("Worker died")
java.lang.RuntimeException: ("Worker died")
    at org.apache.storm.util$exit_process_BANG_.doInvoke(util.clj:341) [storm-core-1.0.3.jar:1.0.3]
    at clojure.lang.RestFn.invoke(RestFn.java:423) [clojure-1.7.0.jar:?]
    at org.apache.storm.daemon.worker$fn__5571$fn__5572.invoke(worker.clj:759) [storm-core-1.0.3.jar:1.0.3]
    at org.apache.storm.daemon.executor$mk_executor_data$fn__4792$fn__4793.invoke(executor.clj:274) [storm-core-1.0.3.jar:1.0.3]
    at org.apache.storm.util$async_loop$fn__557.invoke(util.clj:494) [storm-core-1.0.3.jar:1.0.3]
    at clojure.lang.AFn.run(AFn.java:22) [clojure-1.7.0.jar:?]
    at java.lang.Thread.run(Thread.java:745) [?:1.8.0_121]

Fatal error: local() encountered an error (return code 1) while executing 'storm jar /usr/local/Cellar/storm/1.0.3/libexec/examples/storm-starter/wordcount/_build/wordcount-0.0.1-SNAPSHOT-standalone.jar org.apache.storm.flux.Flux --local --no-splash --sleep 9223372036854775807 /var/folders/_1/p0xw6sw92kn2g_mnh9ggl160000107/T/tmpJalVvX.yaml'

Aborting.
(storm-1.0.3)Jeffreys-MacBook-Pro:wordcount jyoung$ ls /private/var/folders/_1/p0xw6sw92kn2g_mnh9ggl160000107/T/54658c95-597d-4fd5-a67d-9209e8e0ac31/supervisor/stormdist/wordcount-1-1490153387/resources
resources
(storm-1.0.3)Jeffreys-MacBook-Pro:wordcount jyoung$ ls -la /private/var/folders/_1/p0xw6sw92kn2g_mnh9ggl160000107/T/54658c95-597d-4fd5-a67d-9209e8e0ac31/supervisor/stormdist/wordcount-1-1490153387/resources
total 0
drwxr-xr-x  3 jyoung  staff  102 22 Mar 14:29 .
drwxr-xr-x  5 jyoung  staff  170 22 Mar 14:29 ..
drwxr-xr-x  9 jyoung  staff  306 22 Mar 14:29 resources
(storm-1.0.3)Jeffreys-MacBook-Pro:wordcount jyoung$ cd /private/var/folders/_1/p0xw6sw92kn2g_mnh9ggl160000107/T/54658c95-597d-4fd5-a67d-9209e8e0ac31/supervisor/stormdist/wordcount-1-1490153387/resources
(storm-1.0.3)Jeffreys-MacBook-Pro:resources jyoung$ ls
resources
(storm-1.0.3)Jeffreys-MacBook-Pro:resources jyoung$ cd resources/
(storm-1.0.3)Jeffreys-MacBook-Pro:resources jyoung$ ls
bolts           randomsentence.js   splitsentence.py    spouts          storm.js        storm.py        storm.rb
(storm-1.0.3)Jeffreys-MacBook-Pro:resources jyoung$ cd spouts
(storm-1.0.3)Jeffreys-MacBook-Pro:spouts jyoung$ ls
__init__.py __init__.pyc    words.py    words.pyc
(storm-1.0.3)Jeffreys-MacBook-Pro:spouts jyoung$ cd ..
(storm-1.0.3)Jeffreys-MacBook-Pro:resources jyoung$ cd bolts
(storm-1.0.3)Jeffreys-MacBook-Pro:bolts jyoung$ ls
__init__.py __init__.pyc    wordcount.py    wordcount.pyc
(storm-1.0.3)Jeffreys-MacBook-Pro:bolts jyoung$ more wordcount.py
import os
from collections import Counter

from streamparse import Bolt

class WordCountBolt(Bolt):
    outputs = ['word', 'count']

    def initialize(self, conf, ctx):
        self.counter = Counter()
        self.pid = os.getpid()
        self.total = 0

    def _increment(self, word, inc_by):
        self.counter[word] += inc_by
        self.total += inc_by

    def process(self, tup):
        word = tup.values[0]
        self._increment(word, 10 if word == "dog" else 1)
        if self.total % 1000 == 0:
            self.logger.info("counted [{:,}] words [pid={}]".format(self.total,
                                                                    self.pid))
        self.emit([word, self.counter[word]])
(storm-1.0.3)Jeffreys-MacBook-Pro:bolts jyoung$ 
(storm-1.0.3)Jeffreys-MacBook-Pro:bolts jyoung$ pwd
/private/var/folders/_1/p0xw6sw92kn2g_mnh9ggl160000107/T/54658c95-597d-4fd5-a67d-9209e8e0ac31/supervisor/stormdist/wordcount-1-1490153387/resources/resources/bolts
(storm-1.0.3)Jeffreys-MacBook-Pro:bolts jyoung$ pwd
/private/var/folders/_1/p0xw6sw92kn2g_mnh9ggl160000107/T/54658c95-597d-4fd5-a67d-9209e8e0ac31/supervisor/stormdist/wordcount-1-1490153387/resources/resources/bolts
(storm-1.0.3)Jeffreys-MacBook-Pro:bolts jyoung$ ls -la /private/var/folders 
jnhunsberger commented 7 years ago

For those interested, I just ran into the same issue as @jsyoungnet, but on a clean Ubuntu 16.04 LTS server. I was able to fix it by editing run.py like he suggests above. Here are my environment details:

Linux ip-xxx-xxx-xxx-xxx 4.4.0-1013-aws #22-Ubuntu SMP Fri Mar 31 15:41:31 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

streamparse v3.4.0 storm v1.1.0 lein v2.7.1 java v1.8.0_121 zookeeper v3.4.10

to get it to work, I needed to edit run.py located here: /usr/local/lib/python3.5/dist-packages/streamparse/

I edited the line: sys.path.append(os.getcwd()) to be: sys.path.append(os.getcwd() + '/resources')

I needed to do this just to get the streamparse wordcount demo to work. As @jsyoungnet mentions, this is due to the program looking in a directory one level higher than where the files actually are. On my system streamparse was looking for the modules here: /tmp/a3ed7489-0496-4f60-aa25-21a1e4f70131/supervisor/stormdist/wordcount-1-1492876933/resources but needed to be looking for them here: /tmp/a3ed7489-0496-4f60-aa25-21a1e4f70131/supervisor/stormdist/wordcount-1-1492876933/resources/resources

Any chance someone can fix this?

rwatler commented 7 years ago

I can confirm the issue @jnhunsberger and others have reported.

streamparse v3.4.0, storm v1.1.0, and CentOS 7.2