Open ByronHsu opened 7 months ago
Hi @ByronHsu,
Due to the remote streaming nature of those logs, I am afraid that the flushing issue is impossible to solve. All we can do is wait for the logs to be streamed from remote workers.
ray.shutdown()
will clear any resources started by Ray, but normally there is no need to invoke it manually because it will be called automatically at the end of the script.
If such auto shutdown doesn't work in your case, you can either add time.sleep(...)
before ray.shutdown()
or let it sleep for you by passing _exiting_interpreter=True
Search before asking
KubeRay Component
ray-operator
What happened + What you expected to happen
I ran a simple ray program with
ray.shutdown()
at the end, but the logs of k8sjob doesn't contain any worker logs.python file:
k8sjob's pod logs:
The expected behavior is that the log should contain
ray task get X
.If i remove
ray.shutdown()
and run again, i can see the worker logs there.The issue is similar to https://github.com/ray-project/ray/issues/31931. I can reproduce the issue by exec to k8sjob pod and run in a local ray cluster.
I encountered this issue when integrating kuberay 1.1.0 with the latest flyte, where they shutdown the ray program after the task ends. Before the flushing issue is fixed, is removing
ray.shutdown()
from the code to surface up the logs a good approach? or it could introduce other problems?Reproduction script
Create a rayjob
######################Ray code sample#################################
this sample is from https://docs.ray.io/en/latest/cluster/job-submission.html#quick-start-example
it is mounted into the container and executed to show the Ray job at work
apiVersion: v1 kind: ConfigMap metadata: name: ray-job-code-sample data: sample_code.py: |