spark-examples / pyspark-examples

Pyspark RDD, DataFrame and Dataset Examples in Python language
https://sparkbyexamples.com
1.17k stars 895 forks source link

Multiple json file deduplication #11

Closed ayush-96 closed 1 year ago

ayush-96 commented 1 year ago

Reading multiple json files and doing deduplication on array column.