pystorm / streamparse

Run Python in Apache Storm topologies. Pythonic API, CLI tooling, and a topology DSL.
http://streamparse.readthedocs.io/
Apache License 2.0
1.5k stars 218 forks source link

[Question] Storm crashed with Error 'Cannot run program ... "streamparse_run" ... in directory ...error=2, No such file or directory ....' #426

Closed liujiaqiid closed 6 years ago

liujiaqiid commented 6 years ago

The detail error info:

java.lang.RuntimeException: Error when launching multilang subprocess at org.apache.storm.utils.ShellProcess.launch(ShellProcess.java:94) at org.apache.storm.spout.ShellSpout.open(ShellSpout.java:114) at org.apache.storm.daemon.executor$fn__4975$fn__4990.invoke(executor.clj:609) at org.apache.storm.util$async_loop$fn__557.invoke(util.clj:482) at clojure.lang.AFn.run(AFn.java:22) at java.lang.Thread.run(Thread.java:748) Caused by: java.io.IOException: Cannot run program "/opt/data/virtualenvs/hfp_searchindex/bin/streamparse_run" (in directory "/opt/data/storm/supervisor/stormdist/hfp_searchindex-125-1522140260/resources"): error=2, No such file or directory at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048) at org.apache.storm.utils.ShellProcess.launch(ShellProcess.java:87) ... 5 more Caused by: java.io.IOException: error=2, No such file or directory at java.lang.UNIXProcess.forkAndExec(Native Method) at java.lang.UNIXProcess.<init>(UNIXProcess.java:247) at java.lang.ProcessImpl.start(ProcessImpl.java:134) at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029) ... 6 more

The pip install requirements : ./virtualenvs/hfp_searchindex.txt

streamparse # always required for streamparse projects      
bleach==2.1.2 # An easy safelist-based HTML-sanitizing tool.     
jieba==0.39 # Chinese Words Segementation Utilities      
pykafka==2.7.0 # A cluster-aware Kafka>=0.8.2 client for Python     
peewee==3.1.5 # light-weight db orm      
psycopg2-binary==2.7.4     

List the bin directory , the streamparse_run is exists

ls -la /opt/data/virtualenvs/hfp_searchindex/bin/    
...    
-rwxrwxr-x 1 devops devops  258 Mar 27 16:36 sparse     
-rwxrwxr-x 1 devops devops  258 Mar 27 16:36 streamparse      
-rwxrwxr-x 1 devops devops  251 Mar 27 16:36 streamparse_run      
-rwxrwxr-x 1 devops devops  246 Mar 27 16:37 tabulate     
-rwxrwxr-x 1 devops devops  246 Mar 27 16:36 wheel      
...

The config.json

...
"virtualenv_root": "/opt/data/virtualenvs",
"use_virtualenv": true
...
liujiaqiid commented 6 years ago

@dan-blanchard @tdhopper
btw: storm version is 1.2.0; steamparse version is 3.13.1

liujiaqiid commented 6 years ago

It's weird. My solution is:

  1. SSH login in each worker node and Install python virtual env
    virtualenv /opt/data/virtualenv/pyenvforstorm
    source /opt/data/virtualenv/pyenvforstorm/bin/activate
    pip install -r pyenvforstorm.txt
  2. Chg the config.json
    "virtualenv_root": "/opt/data/virtualenvs",
    "install_virtualenv": true,
    "virtualenv_name": "pyenvforstorm",
    "use_virtualenv": true
  3. submit and it works !!
liujiaqiid commented 6 years ago

I intend to use the same virtual env for all my topls. And It looks fine for me。