microsoft / DeepSpeedExamples

Example models using DeepSpeed
Apache License 2.0
5.83k stars 990 forks source link

Benchmark mii stalled and crashed #877

Open Albert-Zhao-2020 opened 3 months ago

Albert-Zhao-2020 commented 3 months ago

I have benchmarked result of mii with the script of run_example.sh which located at "DeepSpeedExamples/benchmarks/inference/mii" in the repository, but it stalled as follows: image Then after a few minutes it crashed as follow: image I have test opt-125m for simplicity and other parameters just ad the script. I have tested succefully inference of mii with the same model in non-persistent and persistent deployment which located at "DeepSpeedExamples/inference/mii/" My enviroments: one A6000 GPU , CUDA 11.8,code version f415ec8 So what‘s wrong?

I update coed from f415ec8 to 279a8fe which is the latest version of 20 Mar and benchmarked mii, which got the result of only stalled as before but not crashed.

@ mrwyattii