Open muratkoc93 opened 1 year ago
@muratkoc93 can you provide a script that I can run? The given script is not runnable because of the Cassandra setup
@rkooo567 job is connecting casandra and reading data.
Can you provide a minimal repro script that doesn't require external dependencies? It should be possible to make a repro that doesn't require cassandra setup. It is not possible for me to run the script to reproduce the issue.
I'm having the same issue, apparently, the problem is still present. What info would help you to fix it?
@vsokolovskii it's the best if you can provide us a repro script that I can run and reproduce the issue.
I also have zombie processes. my cluster has no running jobs or tasks or open client connections, and yet it's leaving 100+ worker nodes alive. this makes using Ray much more expensive in terms of compute hours than it should be.
maybe it happens when there is some unhandled error on the task?
I feel like there are 2 different things going on here.
you are using the autoscaler and expects it to shutdown worker nodes?
yes, precisely
I will hold off on updating / creating issues until I can get a minimal example. I have tried to reduce my code but the problem goes away when I do so.
Hi , we built ray cluster with 3 nodes. We have a job and are submiting it on cluster. After the job is completed , I'm looking at processes using script that is ps -ef | grep ray and I see that the processes are not terminated.
sahip 120004 66798 24 21:52 ? 00:00:07 ray::IDLE sahip 120072 66798 18 21:52 ? 00:00:05 ray::IDLE sahip 120073 66798 23 21:52 ? 00:00:07 ray::IDLE sahip 120074 66798 18 21:52 ? 00:00:05 ray::IDLE sahip 120075 66798 14 21:52 ? 00:00:04 ray::IDLE sahip 120076 66798 15 21:52 ? 00:00:04 ray::IDLE sahip 120077 66798 20 21:52 ? 00:00:06 ray::IDLE sahip 120078 66798 13 21:52 ? 00:00:04 ray::IDLE sahip 120510 66798 9 21:52 ? 00:00:02 ray::IDLE sahip 120511 66798 8 21:52 ? 00:00:02 ray::IDLE sahip 120799 113783 0 21:52 pts/2 00:00:00 grep ray
I dont want to use ps aux | grep ray::IDLE | grep -v grep | awk '{print $2}' | xargs kill -9 script to kill process.
Can I do this with one line of code that I'll add to my code?
Code is :
thank you :)