When reading a json file call meta_book.json, some of the lines corrupt during reading the json file by read_json() in friesian.
It returns null for all the columns, so there are rows remove after drop na opration which effect the final result.
Ai-matrix and recdp groups convert json file to csv file first, it doesn't remove any rows and work fine in spark.
The 14th record in the following figure is one of the records that corrupt during reading.
One workaround is to convert to csv first (like what ai-matrix does), though the conversion may take extra time.
When reading a json file call meta_book.json, some of the lines corrupt during reading the json file by read_json() in friesian. It returns null for all the columns, so there are rows remove after drop na opration which effect the final result. Ai-matrix and recdp groups convert json file to csv file first, it doesn't remove any rows and work fine in spark. The 14th record in the following figure is one of the records that corrupt during reading.
One workaround is to convert to csv first (like what ai-matrix does), though the conversion may take extra time.