I have benchmarked result of mii with the script of run_example.sh which located at "DeepSpeedExamples/benchmarks/inference/mii" in the repository, but it stalled as follows:
Then after a few minutes it crashed as follow:
I have test opt-125m for simplicity and other parameters just ad the script.
I have tested succefully inference of mii with the same model in non-persistent and persistent deployment which located at "DeepSpeedExamples/inference/mii/"
My enviroments: one A6000 GPU , CUDA 11.8,code version f415ec8
So what‘s wrong?
I update coed from f415ec8 to 279a8fe which is the latest version of 20 Mar and benchmarked mii, which got the result of only stalled as before but not crashed.
I have benchmarked result of mii with the script of run_example.sh which located at "DeepSpeedExamples/benchmarks/inference/mii" in the repository, but it stalled as follows: Then after a few minutes it crashed as follow: I have test opt-125m for simplicity and other parameters just ad the script. I have tested succefully inference of mii with the same model in non-persistent and persistent deployment which located at "DeepSpeedExamples/inference/mii/" My enviroments: one A6000 GPU , CUDA 11.8,code version f415ec8 So what‘s wrong?
I update coed from f415ec8 to 279a8fe which is the latest version of 20 Mar and benchmarked mii, which got the result of only stalled as before but not crashed.
@ mrwyattii