mahmoudparsian / data-algorithms-book

MapReduce, Spark, Java, and Scala for Data Algorithms Book
http://mapreduce4hackers.com
Other
1.07k stars 666 forks source link

Chap 14, the output of Builder.scala is not available for classifier.scala #25

Open saksim opened 6 years ago

saksim commented 6 years ago

Dear mahmoudparsian, Sorry to bother you. Actually, it is known that two methods can be used in the propose of saving output when one scala-spark program finishes. As you do in the "NaiveBayesClassifierBuilder.scala", the pt table saved as part-* file in the HDFS. However, my issue is relative to this. RDD's method,called saveAsObjectFile, will return NULL first, and with a sequenceFile output second. Thus, in the second spark program (NaiveBayesClassifier.scala), a NullPointerException throws. In the another hand, if i use saveAsTextFile, the second spark program will show a exception that "A sequenceFile is required". Thus, I'm not sure how to deal with this issue in your scala programme. Could you give me any tips?

Best Wishes, WeiWei HE