nasa-jpl-memex / memex-explorer

Viewers for statistics and dashboarding of Domain Search Engine data
BSD 2-Clause "Simplified" License
121 stars 69 forks source link

Error crawling URLs #707

Open saloneerege opened 8 years ago

saloneerege commented 8 years ago

I get an error in the Crawl log after starting the crawl as follows: ~/miniconda3/envs/memex/lib/nutch ~/memex-explorer/source Injecting seed URLs /home/salonee/miniconda3/envs/memex/lib/nutch/bin/nutch inject /home/salonee/memex-explorer/source/resources/crawls/armslist/crawldb /home/salonee/memex-explorer/source/resources/crawls/Armslist/seeds/seeds Error running: /home/salonee/miniconda3/envs/memex/lib/nutch/bin/nutch inject /home/salonee/memex-explorer/source/resources/crawls/armslist/crawldb /home/salonee/memex-explorer/source/resources/crawls/Armslist/seeds/seeds Failed with exit value 127. ~/memex-explorer/source ~/miniconda3/envs/memex/lib/nutch ~/memex-explorer/source Injecting seed URLs /home/salonee/miniconda3/envs/memex/lib/nutch/bin/nutch inject /home/salonee/memex-explorer/source/resources/crawls/armslist/crawldb /home/salonee/memex-explorer/source/resources/crawls/Armslist/seeds/seeds Error running: /home/salonee/miniconda3/envs/memex/lib/nutch/bin/nutch inject /home/salonee/memex-explorer/source/resources/crawls/armslist/crawldb /home/salonee/memex-explorer/source/resources/crawls/Armslist/seeds/seeds Failed with exit value 127. ~/memex-explorer/source ~/miniconda3/envs/memex/lib/nutch ~/memex-explorer/source Injecting seed URLs /home/salonee/miniconda3/envs/memex/lib/nutch/bin/nutch inject /home/salonee/memex-explorer/source/resources/crawls/armslist/crawldb /home/salonee/memex-explorer/source/resources/crawls/Armslist/seeds/seeds Error running: /home/salonee/miniconda3/envs/memex/lib/nutch/bin/nutch inject /home/salonee/memex-explorer/source/resources/crawls/armslist/crawldb /home/salonee/memex-explorer/source/resources/crawls/Armslist/seeds/seeds Failed with exit value 127. ~/memex-explorer/source

brittainhard commented 8 years ago

Can you do me a favor and type printenv in your terminal and paste the output?

saloneerege commented 8 years ago

HOME=/home/salonee SHLVL=1 LANGUAGE=en_US GNOME_DESKTOP_SESSION_ID=this-is-deprecated CONDA_ENV_PATH=/home/salonee/miniconda3/envs/memex LOGNAME=salonee COMPIZ_BIN_PATH=/usr/bin/ XDG_DATA_DIRS=/usr/share/ubuntu:/usr/share/gnome:/usr/local/share/:/usr/share/ QT4_IM_MODULE=xim DBUS_SESSION_BUS_ADDRESS=unix:abstract=/tmp/dbus-D3UDCG61nA CONDA_DEFAULT_ENV=memex LESSOPEN=| /usr/bin/lesspipe %s INSTANCE= TEXTDOMAIN=im-config XDG_RUNTIME_DIR=/run/user/1000 DISPLAY=:0 XDG_CURRENT_DESKTOP=Unity GTK_IMMODULE=ibus LESSCLOSE=/usr/bin/lesspipe %s %s TEXTDOMAINDIR=/usr/share/locale/ COLORTERM=gnome-terminal XAUTHORITY=/home/salonee/.Xauthority =/usr/bin/printenv

brittainhard commented 8 years ago

My gut instinct here is that you do not have JAVA_HOME set on your path. There's some documentation on how to do this here: http://wiki.apache.org/nutch/NutchTutorial. Look at the "Verifying your Nutch Installation" section. I think you can ignore the part about messing with etc/hosts

ahmadia commented 8 years ago

I agree with Brittain, this looks symptomatic of JAVA_HOME not being set. I think this is something where Nutch itself could be more robust.