rjurney / Agile_Data_Code_2

Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition
http://bit.ly/agile_data_science
MIT License
456 stars 307 forks source link

AttributeError: 'DataFrame' object has no attribute 'map' #28

Closed kelvinksau closed 7 years ago

kelvinksau commented 7 years ago

The following program would not work under Spark 2.0

Use .rdd.map:

DataFrame.map has been removed in Spark 2.

Load the parquet file

on_time_dataframe = spark.read.parquet('data/on_time_performance.parquet') on_time_dataframe.registerTempTable("on_time_performance")

Dump the unneeded fields

tail_numbers = on_time_dataframe.map(lambda x: x.TailNum) tail_numbers = tail_numbers.filter(lambda x: x != '')

distinct() gets us unique tail numbers

unique_tail_numbers = tail_numbers.distinct()

now we need a count() of unique tail numbers

airplane_count = unique_tail_numbers.count() print("Total airplanes: {}".format(airplane_count))

rjurney commented 7 years ago

Sorry about this, I fixed this a few days ago. Please re-open if you have any more issues.

kelvinksau commented 7 years ago

Do you have a script to generate faa_tail_number_inquiry.jsonl ? I can see any python to handle this ?

Thx for the great work

Sent from my iPhone

On Mar 12, 2017, at 11:34, Russell Jurney notifications@github.com wrote:

Closed #28.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.