Closed limkothari closed 4 years ago
@limkothari This is not the cause of the error you are experiencing. Spark persistence does not work the way you are describing: https://spark.apache.org/docs/latest/rdd-programming-guide.html#rdd-persistence
Thanks for the reference @ashelkovnykov . I actually tried with MEMORY_AND_DISK_SER on the internal library and it seemed to work. I will comment more on the internal jira.
Changing storage level won't fix this issue. We are trying some solution and will keep you updated.
Currently each entityId data set is loaded into memory. This causes failure when the datasize is too big to fit into memory. Fixing the storage level to MEMORY_AND_DISK to prevent this issue