Closed jayteaftw closed 2 months ago
Hum, that's odd... And no more logs, it stays like that? How much time did you wait? (to see if more errors were given after eventual timeout)
My team is successfully running the same model (Mistral-7B-Instruct-v0.2) with the same 0.4.2 version of this image on multi-gpu setups without issue. I'm not sure why it appears stalled in this case, but it does work generally.
No more comments, closing at least for now.
Hi, so tried using your deployment.yaml; however while the single GPU instance works, multi GPU stalls. Here is the log output
And the deployment file modified