linkedin / photon-ml

A scalable machine learning library on Apache Spark
Other
792 stars 185 forks source link

RandomEffectDataset bug fix #452

Closed limkothari closed 4 years ago

limkothari commented 4 years ago

Currently each entityId data set is loaded into memory. This causes failure when the datasize is too big to fit into memory. Fixing the storage level to MEMORY_AND_DISK to prevent this issue

ashelkovnykov commented 4 years ago

@limkothari This is not the cause of the error you are experiencing. Spark persistence does not work the way you are describing: https://spark.apache.org/docs/latest/rdd-programming-guide.html#rdd-persistence

limkothari commented 4 years ago

Thanks for the reference @ashelkovnykov . I actually tried with MEMORY_AND_DISK_SER on the internal library and it seemed to work. I will comment more on the internal jira.

yunboouyang commented 4 years ago

Changing storage level won't fix this issue. We are trying some solution and will keep you updated.