kubeflow / spark-operator

Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
Apache License 2.0
2.79k stars 1.37k forks source link

Question: does executor needs checkpoint directory? #913

Open FloraZhang opened 4 years ago

FloraZhang commented 4 years ago

Hi,

I have a question on configuring checkpoint directory for spark application chart.

According to spark documentation https://spark.apache.org/docs/latest/streaming-programming-guide.html#checkpointing Spark needs distributed file system to store its checkpoint data so that in case of failure, it can recover from checkpoint directory.

In spark application helm chart I have a checkpointlocation configuration:

spec:
   sparkConf:
     "spark.ui.port": "40450"
     "spark.sql.streaming.checkpointLocation": "file:///opt/checkpoint-data"

I created a checkpoint pvc and mount the volume for driver pod:

  volumes:
    - name: checkpoint-volume
      persistentVolumeClaim:
        claimName: checkpoint-pvc

driver:
    volumeMounts:
      - name: checkpoint-volume
        mountPath: "/opt/checkpoint-data"

In Spark Streaming Guide, it mentioned "Failed driver can be restarted from checkpoint information" and when executor failed, "Tasks and receivers restarted by Spark automatically, no config needed"

So my question is: does executor also needs this checkpoint directory for recovery?

Thanks, Wenjing

github-actions[bot] commented 2 weeks ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.