spark-examples / pyspark-examples

Pyspark RDD, DataFrame and Dataset Examples in Python language
https://sparkbyexamples.com
1.17k stars 895 forks source link

Fix timediff.py #6

Closed wtysos11 closed 1 year ago

wtysos11 commented 2 years ago

I found two problems in timediff.py. First I met the same problem as Romeo Kienzler when running timediff.py and found that PySpark SQL data types are no longer singletons(seems to be the case before 1.3). Second is my interpreter seems to use built-in round function if I don't add from pyspark.sql.functions import round. I have run the timediff.py in python 3.7.13 with pyspark 3.3.0 and spark 3.3.0, everything works fine.

+--------------------+--------------------+-------------+-------------+-----------+-------------+
|     input_timestamp|   current_timestamp|DiffInSeconds|DiffInMinutes|DiffInHours|   DiffInDays|
+--------------------+--------------------+-------------+-------------+-----------+-------------+
|2019-07-01 12:01:...|2022-07-04 15:24:...|     94965792|    1582763.0|    26379.0|1.42448688E10|
|2019-06-24 12:01:...|2022-07-04 15:24:...|     95570592|    1592843.0|    26547.0|1.43355888E10|
|2019-11-16 16:44:...|2022-07-04 15:24:...|     83025576|    1383760.0|    23063.0|1.24538364E10|
|2019-11-16 16:50:...|2022-07-04 15:24:...|     83025212|    1383754.0|    23063.0|1.24537818E10|
+--------------------+--------------------+-------------+-------------+-----------+-------------+