rjurney / Agile_Data_Code_2

Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition
http://bit.ly/agile_data_science
MIT License
457 stars 308 forks source link

ch06/extract_airlines.py throws error at airplanes_per_carrier.count() #4

Closed md6nguyen closed 7 years ago

md6nguyen commented 7 years ago

airplanes_per_carrier.count() [Stage 3:===============================================> (172 + 4) / 200]17/02/10 23:15:43 ERROR Executor: Exception in task 15.0 in stage 3.0 (TID 392) org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/home/ubuntu/spark/python/lib/pyspark.zip/pyspark/worker.py", line 174, in main process() File "/home/ubuntu/spark/python/lib/pyspark.zip/pyspark/worker.py", line 169, in process serializer.dump_stream(func(split_index, iterator), outfile) File "/home/ubuntu/spark/python/pyspark/rdd.py", line 2407, in pipeline_func return func(split, prev_func(split, iterator)) File "/home/ubuntu/spark/python/pyspark/rdd.py", line 2407, in pipeline_func return func(split, prev_func(split, iterator)) File "/home/ubuntu/spark/python/pyspark/rdd.py", line 2407, in pipeline_func return func(split, prevfunc(split, iterator)) File "/home/ubuntu/spark/python/pyspark/rdd.py", line 346, in func return f(iterator) File "/home/ubuntu/spark/python/pyspark/rdd.py", line 1041, in return self.mapPartitions(lambda i: [sum(1 for in i)]).sum() File "/home/ubuntu/spark/python/pyspark/rdd.py", line 1041, in return self.mapPartitions(lambda i: [sum(1 for _ in i)]).sum() File "", line 9, in TypeError: unorderable types: NoneType() < str()

at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:193)
at org.apache.spark.api.python.PythonRunner$$anon$1.<init>(PythonRDD.scala:234)
at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:152)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:63)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
rjurney commented 7 years ago

Pull latest and try again.

md6nguyen commented 7 years ago

Worked now.

md6nguyen commented 7 years ago

Thanks and have fun :)

Minh

On Fri, Feb 10, 2017 at 4:27 PM, Russell Jurney notifications@github.com wrote:

Pull latest and try again.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/rjurney/Agile_Data_Code_2/issues/4#issuecomment-279102853, or mute the thread https://github.com/notifications/unsubscribe-auth/AD-EsLbC1Pl-v2GphzD9TUVP_mLWhm7Mks5rbQBmgaJpZM4L9-CO .