n1CkS4x0 commented 2 years ago

Scenario

I have launched Spark on Ray on Kubernetes performing data processing and training using 4 m5a instances on AWS which means 4 cores and 16GB RAM with the following parameters:

1) 3100m core k8s for each node 2) 14000MI RAM k8s per node 3) 8000MB spark executor memory per node (1 executor with 3 cores per node) 4) 1000MB ray object store memory per node 6) dataset size 4GB

If I increase spark executor memory I have an error due to the exceeding threshold of the containers, otherwise, If I decrease the spark executor memory I have java.lang.OutOfMemoryError: Java heap space.

Is there a way to set the best parameters to perform data processing and training with massive datasets avoiding to exploit a significant number of instances ?

carsonwang commented 2 years ago

You can try to tune a few Spark configurations to see if you can run it successfully with less memory. For example, you can increase "spark.sql.shuffle.partitions". You can also allocate less cores per executor so each core will have more memory, but in this way you will have less parallelism.

kira-lin commented 1 year ago

close as stale

oap-project / raydp

Memory requirements Spark on Ray on Kubernetes #261

Scenario