Closed baojunliu closed 1 year ago
This was a bug that has been fixed in #262. Please update to the latest main (we will also do a patch release with this and other bug fixes later this week).
Closing, this was resolved in the latest release (v0.1.1).
I am trying to use two GPUs with tensor_parallel=2. It seems it only releases memory on one gpu. There is some process still running. The client.terminate_server doesn't seem to kill all processes. I can kill the process manually, but how can I do it properly in the python code?