Closed max-509 closed 7 months ago
Hi @max-509 , thanks for using RayDP! In this case, you can assign ownership to RayDPMaster, and use raydp.stop_spark(cleanup_data=False) to stop the session and free up the resources. By setting cleanup_data to False, RayDPMaster is actually not killed, so the data is still accessible.
But yes, your suggestion makes sense, ownership should be able to be assigned to a user specified actor. This should be very easy, are you willing to submit a PR?
Hello! Thank you for awesome library that helps me use Spark and Ray advantages.
When I transform Spark Dataframe to ray Dataset, I have only 2 options for specifying the owner of serialized partitions:
I will give a usage scenario when none of the ownership options can be satisfactory.
I want to do some preprocessing in Spark, convert a preprocessed DataFrame into a ray Dataset, and stop Spark (call raydp.stop_spark()) to free up ray cluster resources. But after stopping Spark, I can't use the created ray Dataset because the owner of the serialized tables has died. I suggest adding a function that can accept an actor who should become the owner of serialized partitions. For example:
I hope that my suggestion will be useful.