pwendell / spark-twitter-collection

Spark example of collecting tweets and loading into HDFS/S3
43 stars 12 forks source link

ClassNotFoundException error when running the example code #1

Open mrobot2 opened 10 years ago

mrobot2 commented 10 years ago

I'm studying Spark. When I run the example, I encounter a exception. Can you help me to find the way to solve the error. I am using Spark 0.9.1. Thank you.

14/04/18 00:27:07 WARN scheduler.TaskSetManager: Loss was due to java.lang.ClassNotFoundException java.lang.ClassNotFoundException: org.apache.spark.streaming.twitter.TwitterReceiver at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:270) at org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:37) at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612) at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1706) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1344) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) at java.io.ObjectInputStream.defaultReadObject(ObjectInputStream.java:500) at org.apache.spark.rdd.ParallelCollectionPartition.readObject(ParallelCollectionRDD.scala:72) at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) at org.apache.spark.scheduler.ResultTask.readExternal(ResultTask.scala:145) at java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1837) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:62) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:193) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:42) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:41) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:41) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:176) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) 14/04/18 00:27:07 INFO scheduler.TaskSetManager: Starting task 2.0:0 as TID 71 on executor 2: ip-10-41-136-153.ec2.internal (PROCESS_LOCAL) 14/04/18 00:27:07 INFO scheduler.TaskSetManager: Serialized task 2.0:0 as 4358 bytes in 1 ms 14/04/18 00:27:07 WARN scheduler.TaskSetManager: Lost TID 71 (task 2.0:0) 14/04/18 00:27:07 INFO scheduler.TaskSetManager: Loss was due to java.lang.ClassNotFoundException: org.apache.spark.streaming.twitter.TwitterReceiver [duplicate 1] 14/04/18 00:27:07 INFO scheduler.TaskSetManager: Starting task 2.0:0 as TID 72 on executor 2: ip-10-41-136-153.ec2.internal (PROCESS_LOCAL) 14/04/18 00:27:07 INFO scheduler.TaskSetManager: Serialized task 2.0:0 as 4358 bytes in 2 ms 14/04/18 00:27:07 WARN scheduler.TaskSetManager: Lost TID 72 (task 2.0:0) 14/04/18 00:27:07 INFO scheduler.TaskSetManager: Loss was due to java.lang.ClassNotFoundException: org.apache.spark.streaming.twitter.TwitterReceiver [duplicate 2] 14/04/18 00:27:07 INFO scheduler.TaskSetManager: Starting task 2.0:0 as TID 73 on executor 2: ip-10-41-136-153.ec2.internal (PROCESS_LOCAL) 14/04/18 00:27:07 INFO scheduler.TaskSetManager: Serialized task 2.0:0 as 4358 bytes in 2 ms 14/04/18 00:27:07 WARN scheduler.TaskSetManager: Lost TID 73 (task 2.0:0) 14/04/18 00:27:07 INFO scheduler.TaskSetManager: Loss was due to java.lang.ClassNotFoundException: org.apache.spark.streaming.twitter.TwitterReceiver [duplicate 3] 14/04/18 00:27:07 ERROR scheduler.TaskSetManager: Task 2.0:0 failed 4 times; aborting job 14/04/18 00:27:07 INFO scheduler.TaskSchedulerImpl: Remove TaskSet 2.0 from pool 14/04/18 00:27:07 INFO scheduler.DAGScheduler: Failed to run runJob at NetworkInputTracker.scala:182 error org.apache.spark.SparkException: Job aborted: Task 2.0:0 failed 4 times (most recent failure: Exception failure: java.lang.ClassNotFoundException: org.apache.spark.streaming.twitter.TwitterReceiver) org.apache.spark.SparkException: Job aborted: Task 2.0:0 failed 4 times (most recent failure: Exception failure: java.lang.ClassNotFoundException: org.apache.spark.streaming.twitter.TwitterReceiver) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1028) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1026) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1026) at org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:619) at org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:619) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:619) at org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:207) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) at akka.actor.ActorCell.invoke(ActorCell.scala:456) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) at akka.dispatch.Mailbox.run(Mailbox.scala:219) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) [trace] Stack trace suppressed: run last compile:run for the full output. 14/04/18 00:27:08 INFO scheduler.NetworkInputTracker: Stream 0 received 0 blocks 14/04/18 00:27:08 INFO scheduler.JobScheduler: Added jobs for time 1397780828000 ms -------------------------------------------14/04/18 00:27:08 INFO scheduler.JobScheduler: Starting job streaming job 1397780828000 ms.0 from job set of time 1397780828000 ms

Time: 1397780828000 ms

HermesCheng commented 10 years ago

I got the same problem when I used TwitterUtils to create JavaDStream. Below is part of my code.

            String[] filterString={...};
    ConfigurationBuilder cb = new ConfigurationBuilder();
    cb.setOAuthConsumerKey("...")
                .setOAuthConsumerSecret("...")
                .setOAuthAccessToken("...")
                .setOAuthAccessTokenSecret("...");
    Authorization oauth2 = new OAuth2Authorization(cb.build());
    JavaStreamingContext ssc = new JavaStreamingContext("spark://....", "Sample",
        new Duration(30000), System.getenv("SPARK_HOME"), JavaStreamingContext.jarOfClass(SimpleApp.class));
    JavaDStream<Status> tweets = TwitterUtils.createStream(ssc, oauth2, filterString);