Open Manish-2004 opened 3 months ago
Hello @anyscalesam, Is there any update on this issue?
We submit all our actions to Ray cluster with JobSubmissionClient
as described. We face with the issue only with Tune. Also, please note that the execution of the jobs are still good, the issue caused when we try to get the logs
What happened + What you expected to happen
I am trying to run this https://docs.ray.io/en/latest/tune/examples/tune-xgboost.html#id8 example using JobSubmissionClient to run the script, the example is running fine but getting this below error while fetching the job logs using client.get_job_logs(job_id)
Versions / Dependencies
Kuberay Operator v1.1.1 Ray v2.21.0
Reproduction script
import ray from ray.job_submission import JobSubmissionClient import time
Ray cluster information for connection
ray_head_ip = "kuberay-head-svc.kuberay.svc.cluster.local" ray_head_port = 8265 ray_address = f"http://{ray_head_ip}:{ray_head_port}" client = JobSubmissionClient(ray_address)
Submit Ray job using JobSubmissionClient
job_id = client.submit_job( entrypoint="python xgb.py", runtime_env={ "working_dir": "./", }, entrypoint_num_cpus=3 )
print(f"Ray job submitted with job_id: {job_id}")
Waiting for Ray to finish the job and print the result
while True: status = client.get_job_status(job_id) if status in [ray.job_submission.JobStatus.RUNNING, ray.job_submission.JobStatus.PENDING]: time.sleep(5) else: break client.get_job_logs(job_id)
Issue Severity
High: It blocks me from completing my task.