Closed kelvinksau closed 7 years ago
Sorry about this, I fixed this a few days ago. Please re-open if you have any more issues.
Do you have a script to generate faa_tail_number_inquiry.jsonl ? I can see any python to handle this ?
Thx for the great work
Sent from my iPhone
On Mar 12, 2017, at 11:34, Russell Jurney notifications@github.com wrote:
Closed #28.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.
The following program would not work under Spark 2.0
Use .rdd.map:
DataFrame.map has been removed in Spark 2.
Load the parquet file
on_time_dataframe = spark.read.parquet('data/on_time_performance.parquet') on_time_dataframe.registerTempTable("on_time_performance")
Dump the unneeded fields
tail_numbers = on_time_dataframe.map(lambda x: x.TailNum) tail_numbers = tail_numbers.filter(lambda x: x != '')
distinct() gets us unique tail numbers
unique_tail_numbers = tail_numbers.distinct()
now we need a count() of unique tail numbers
airplane_count = unique_tail_numbers.count() print("Total airplanes: {}".format(airplane_count))