kubeflow / spark-operator

Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
Apache License 2.0
2.81k stars 1.38k forks source link

Cannot fetch executor logs with sparkctl #1187

Closed oleksiilopasov closed 1 month ago

oleksiilopasov commented 3 years ago

Fetching executor logs with sparkctl returns error:

sparkctl log -e 1 spark-pi --namespace default
failed to get driver logs of SparkApplication spark-pi: the server could not find the requested resource (get pods spark-pi-exec-1)

Here is app status:

sparkctl status spark-pi --namespace default
application state:
+---------+----------------+----------------+-----------------+------------------+--------------------+-------------------+
|  STATE  | SUBMISSION AGE | COMPLETION AGE |   DRIVER POD    |    DRIVER UI     | SUBMISSIONATTEMPTS | EXECUTIONATTEMPTS |
+---------+----------------+----------------+-----------------+------------------+--------------------+-------------------+
| RUNNING | 6m             | N.A.           | spark-pi-driver | 10.40.3.199:4040 |                  1 |                 1 |
+---------+----------------+----------------+-----------------+------------------+--------------------+-------------------+
executor state:
+-------------------------------+---------+
|         EXECUTOR POD          |  STATE  |
+-------------------------------+---------+
| spark-pi-1614684550398-exec-1 | RUNNING |
+-------------------------------+---------+

We can see the executor pod name spark-pi-1614684550398-exec-1, though log command searches for spark-pi-exec-1. In current case 1614684550398 is date +%s format, but log misses it for some reason. Please fix

Spark-operator image tag: v1beta2-1.1.2-2.4.5 Helm chart: v1.0.6

github-actions[bot] commented 1 month ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] commented 1 month ago

This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.