lintool / warcbase

Warcbase is an open-source platform for managing analyzing web archives
http://warcbase.org/
161 stars 47 forks source link

running a spark application fails on EC2 with warcbase dependecy #248

Open dportabella opened 7 years ago

dportabella commented 7 years ago

I have a simple spark application, and it runs ok on my laptop using spark-submit. However, I get this runtime error when I run it using spark-submit on a Amazon EC2 cluster:

$ /root/spark/bin/spark-submit --class application.Example --master spark://ec2-54-227-170-20.compute-1.amazonaws.com:7077 /root/example-assembly-0.1-SNAPSHOT.jar 

java.lang.IncompatibleClassChangeError: Found class org.apache.hadoop.mapreduce.TaskAttemptContext, but interface was expected
    at org.warcbase.mapreduce.WacGenericInputFormat$GenericArchiveRecordReader.initialize(WacGenericInputFormat.java:71)

I made sure that /root/example-assembly-0.1-SNAPSHOT.jar does not contain org/apache/spark nor org/apache/hadoop.

I am using spark 1.6.1 and hadoop 2.6.0. I see that warcbase depends on hadoop 2.6.0-cdh5.7.1.

What can be the problem?