Open bwebster opened 8 years ago
Usually that error is indicative of some problem loading your code, like a syntax error, missing constant, etc. It gets reported really confusingly as a failure to find rubydoop
, but it should be interpreted more as an error that was raised while rubydoop
loaded (and it loads your code).
You could try running locally and set -Djruby.log.exceptions=true -Djruby.log.backtraces=true
to see all Ruby errors. There will be lots and lots of them, but keep an eye out for any that look like they could be from loading your code.
Thanks for the quick feedback. I'll give that a shot today and see what I can find.
The odd thing is that it works fine when running against the same version of hadoop, installed locally via brew.
Sorry, I'm a bit rusty on using hadoop. I've tried all of the following when running locally and I'm not seeing any sort of errors or traces (I was expected to see a lot based on your comment):
hadoop jar build/test.jar -D mapred.child.java.opts="-Druby.log.exceptions=true -Druby.log.backtraces=true" test abc123
hadoop jar build/test.jar -Druby.log.exceptions=true -Druby.log.backtraces=true test abc123
hadoop jar build/test.jar test abc123 -Druby.log.exceptions=true -Druby.log.backtraces=true
The last command should be the one, but it should be -Djruby…
not -Druby…
.
I tried running this past my colleagues, but none of them had tried EMR 5, so we unfortunately don't have any experience if it turns out to be an EMR 5 thing.
Good catch on the -Djruby
fail. I've fixed that.
I've narrowed my code down to something very simple. I have a lib/test.rb
file, which looks like this:
puts "Running test"
I then rake package
, upload my jar to s3, and then add a step using that custom jar and pass the following options:
test
-Djruby.log.exceptions=true
-Djruby.log.backtraces=true
Even with that very simple setup, I'm still getting
LoadError: no such file to load -- rubydoop
require at org/jruby/RubyKernel.java:956
require at uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rubygems/core_ext/kernel_require.rb:55
Exception in thread "main" org.jruby.embed.InvokeFailedException: (LoadError) no such file to load -- rubydoop
at org.jruby.embed.internal.EmbedRubyObjectAdapterImpl.call(EmbedRubyObjectAdapterImpl.java:320)
at org.jruby.embed.internal.EmbedRubyObjectAdapterImpl.callMethod(EmbedRubyObjectAdapterImpl.java:250)
at org.jruby.embed.ScriptingContainer.callMethod(ScriptingContainer.java:1412)
at rubydoop.InstanceContainer.getRuntime(InstanceContainer.java:30)
at rubydoop.RubydoopJobRunner.run(RubydoopJobRunner.java:25)
at rubydoop.RubydoopJobRunner.run(RubydoopJobRunner.java:18)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at rubydoop.RubydoopJobRunner.main(RubydoopJobRunner.java:50)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: org.jruby.exceptions.RaiseException: (LoadError) no such file to load -- rubydoop
at org.jruby.RubyKernel.require(org/jruby/RubyKernel.java:956)
at RUBY.require(uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rubygems/core_ext/kernel_require.rb:55)
What version of EMR are you successfully using? Are there any other techniques you are using to get better diagnostic info?
Thanks for the help.
We run Rubydoop-based jobs on EMR 3.9.0 and 4.2.0. I don't think it's because we haven't gotten it to work on 5, more that we haven't built a new one since 5 was released and haven't had a reason to test.
Which version of Rubydoop are you using, and which version of JRuby?
Here is my setup. I'm going to try running on 4.2.0 to see if that gets things going.
I've been putting my jar in s3, and all my input files are in s3. Couple questions about that:
Gemfile
gem "rubydoop", "1.2.1"
group :development do
gem "rake"
gem "jruby-jars", "= 9.1.5.0"
end
.ruby-version
jruby-9.1.5.0
Java
java version "1.8.0_45"
Java(TM) SE Runtime Environment (build 1.8.0_45-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode)
JRuby
jruby 9.1.5.0 (2.3.1) 2016-09-07 036ce39 Java HotSpot(TM) 64-Bit Server VM 25.45-b02 on 1.8.0_45-b14 +jit [darwin-x86_64]
Hadoop
Hadoop 2.7.2
I've taken a step back and created a new project that follows the word count example in the README for v1.2.1.
I'm going to try and get that working on EMR 4.2.0 and go from there.
This issue is fairly old, but I'm running into the same thing. I'm modernizing a project that is several years old to use the latest AWS SDKs, EMR release and jruby 9.1.16.0 (previously ran on 1.7.20). I've isolated the change to the jruby upgrade - 9.1 passes our specs, but does not pass rubydoop specs (on either of master or the v1.2.x branch) and generates this error in specs as well as AWS.
@enifsieus sorry to hear that. I think that I would need some help to get Rubydoop up to date for newer Hadoop and JRuby versions. I have mostly moved away from Hadoop and only have some legacy jobs that still use Rubydoop.
I'm prototyping out a solution to use Rubydoop on EMR. I have written some jobs which I can run fine locally. But when I try to execute it on EMR, I get the following error:
In terms of setup, I have run
rake package
and then uploaded the jar to an s3 bucket. When configuring my cluster step, I am able to select the custom jar. I then pass the following arguments:test abc123
.test
is the name of the job script (test.rb
) andabc123
is the argument I want to pass.The main class I'm trying to execute is
lib/test.rb
and looks like this:Any ideas?