rossf7 / elasticrawl

Launch AWS Elastic MapReduce jobs that process Common Crawl data.
MIT License
49 stars 13 forks source link

Issues running parse #10

Closed ryanrussell closed 8 years ago

ryanrussell commented 9 years ago

Hi, Trying to run ubuntu 14.04 and can't get through the parse step on the github readme.

Any ideas as to the source of the issue?

Thanks! Ryan

root@etl-local:/home/ubuntu/workspace/elasticrawl-1.1.3-linux-x86_64# ./elasticrawl parse CC-MAIN-2014-49 --max-segments 2 --max-files 3 Segments Segment: 1416400372202.67 Files: 150 Segment: 1416400372490.23 Files: 124

Job configuration Crawl: CC-MAIN-2014-49 Segments: 2 Parsing: 3 files per segment

Cluster configuration Master: 1 m1.medium (Spot: 0.12) Core: 2 m1.medium (Spot: 0.12) Task: -- Launch job? (y/n) y ERROR: Elasticrawl::ElasticMapReduceAccessError /home/ubuntu/workspace/elasticrawl-1.1.3-linux-x86_64/lib/vendor/ruby/2.1.0/gems/elasticrawl-1.1.3/lib/elasticrawl/job.rb:52:in rescue in run_job_flow' /home/ubuntu/workspace/elasticrawl-1.1.3-linux-x86_64/lib/vendor/ruby/2.1.0/gems/elasticrawl-1.1.3/lib/elasticrawl/job.rb:48:inrun_job_flow' /home/ubuntu/workspace/elasticrawl-1.1.3-linux-x86_64/lib/vendor/ruby/2.1.0/gems/elasticrawl-1.1.3/lib/elasticrawl/parse_job.rb:22:in run' /home/ubuntu/workspace/elasticrawl-1.1.3-linux-x86_64/lib/vendor/ruby/2.1.0/gems/elasticrawl-1.1.3/bin/elasticrawl:65:inparse' /home/ubuntu/workspace/elasticrawl-1.1.3-linux-x86_64/lib/vendor/ruby/2.1.0/gems/thor-0.19.1/lib/thor/command.rb:27:in run' /home/ubuntu/workspace/elasticrawl-1.1.3-linux-x86_64/lib/vendor/ruby/2.1.0/gems/thor-0.19.1/lib/thor/invocation.rb:126:ininvoke_command' /home/ubuntu/workspace/elasticrawl-1.1.3-linux-x86_64/lib/vendor/ruby/2.1.0/gems/thor-0.19.1/lib/thor.rb:359:in dispatch' /home/ubuntu/workspace/elasticrawl-1.1.3-linux-x86_64/lib/vendor/ruby/2.1.0/gems/thor-0.19.1/lib/thor/base.rb:440:instart' /home/ubuntu/workspace/elasticrawl-1.1.3-linux-x86_64/lib/vendor/ruby/2.1.0/gems/elasticrawl-1.1.3/bin/elasticrawl:134:in <top (required)>' /home/ubuntu/workspace/elasticrawl-1.1.3-linux-x86_64/lib/vendor/ruby/2.1.0/bin/elasticrawl:23:inload' /home/ubuntu/workspace/elasticrawl-1.1.3-linux-x86_64/lib/vendor/ruby/2.1.0/bin/elasticrawl:23:in <main>' /home/ubuntu/workspace/elasticrawl-1.1.3-linux-x86_64/lib/vendor/ruby/2.1.0/gems/elasticrawl-1.1.3/bin/elasticrawl:146:inrescue in <top (required)>': undefined method status' for "InstanceProfile is required for creating cluster":String (NoMethodError) from /home/ubuntu/workspace/elasticrawl-1.1.3-linux-x86_64/lib/vendor/ruby/2.1.0/gems/elasticrawl-1.1.3/bin/elasticrawl:133:in<top (required)>' from /home/ubuntu/workspace/elasticrawl-1.1.3-linux-x86_64/lib/vendor/ruby/2.1.0/bin/elasticrawl:23:in load' from /home/ubuntu/workspace/elasticrawl-1.1.3-linux-x86_64/lib/vendor/ruby/2.1.0/bin/elasticrawl:23:in

'

tomas-hanzlik commented 9 years ago

Someone who figured out how to solve this problem?

Thanks!

rossf7 commented 9 years ago

Really sorry that this has fallen off my radar. I'll try and take a look at the weekend.

benmccann commented 9 years ago

I'm getting this error as well. Here's the stack trace in an easier to read format:

ERROR: Elasticrawl::ElasticMapReduceAccessError
/src/elasticrawl-1.1.3-linux-x86_64/lib/vendor/ruby/2.1.0/gems/elasticrawl-1.1.3/lib/elasticrawl/job.rb:52:in `rescue in run_job_flow'
/src/elasticrawl-1.1.3-linux-x86_64/lib/vendor/ruby/2.1.0/gems/elasticrawl-1.1.3/lib/elasticrawl/job.rb:48:in `run_job_flow'
/src/elasticrawl-1.1.3-linux-x86_64/lib/vendor/ruby/2.1.0/gems/elasticrawl-1.1.3/lib/elasticrawl/parse_job.rb:22:in `run'
/src/elasticrawl-1.1.3-linux-x86_64/lib/vendor/ruby/2.1.0/gems/elasticrawl-1.1.3/bin/elasticrawl:65:in `parse'
/src/elasticrawl-1.1.3-linux-x86_64/lib/vendor/ruby/2.1.0/gems/thor-0.19.1/lib/thor/command.rb:27:in `run'
/src/elasticrawl-1.1.3-linux-x86_64/lib/vendor/ruby/2.1.0/gems/thor-0.19.1/lib/thor/invocation.rb:126:in `invoke_command'
/src/elasticrawl-1.1.3-linux-x86_64/lib/vendor/ruby/2.1.0/gems/thor-0.19.1/lib/thor.rb:359:in `dispatch'
/src/elasticrawl-1.1.3-linux-x86_64/lib/vendor/ruby/2.1.0/gems/thor-0.19.1/lib/thor/base.rb:440:in `start'
/src/elasticrawl-1.1.3-linux-x86_64/lib/vendor/ruby/2.1.0/gems/elasticrawl-1.1.3/bin/elasticrawl:134:in `<top (required)>'
/src/elasticrawl-1.1.3-linux-x86_64/lib/vendor/ruby/2.1.0/bin/elasticrawl:23:in `load'
/src/elasticrawl-1.1.3-linux-x86_64/lib/vendor/ruby/2.1.0/bin/elasticrawl:23:in `<main>'
/src/elasticrawl-1.1.3-linux-x86_64/lib/vendor/ruby/2.1.0/gems/elasticrawl-1.1.3/bin/elasticrawl:146:in `rescue in <top (required)>': undefined method `status' for "InstanceProfile is required for creating cluster":String (NoMethodError)
    from /src/elasticrawl-1.1.3-linux-x86_64/lib/vendor/ruby/2.1.0/gems/elasticrawl-1.1.3/bin/elasticrawl:133:in `<top (required)>'
    from /src/elasticrawl-1.1.3-linux-x86_64/lib/vendor/ruby/2.1.0/bin/elasticrawl:23:in `load'
    from /src/elasticrawl-1.1.3-linux-x86_64/lib/vendor/ruby/2.1.0/bin/elasticrawl:23:in `<main>'
benmccann commented 9 years ago

Here are the docs for "InstanceProfile"

rossf7 commented 9 years ago

@benmccann I've released v1.1.4 of the deploy packages and pushed the change to RubyGems. It's working for me but please could you retest to make sure it fixes the issue?

benmccann commented 9 years ago

awesome. thanks so much for the quick release

rossf7 commented 8 years ago

Cleaning up issues and noticed I'd forgotten to close this.