Closed treya-lin closed 7 months ago
Hey @treya-lin - we did indeed consider the VRAM for these models, but found it to be highly dependent on GPU hardware, CUDA version and even PyTorch version. For example, VRAM changed considerably going between PyTorch 1.13 and 2.0 for the same models on the same hardware. Therefore, we decided to quote the parameter count as a "proxy" for VRAM usage, in order to give a fair and reliable estimate for the expected memory.
To give you some idea of VRAM, here are very preliminary results I got benchmarking randomly initialised teacher/student models on a 16GB T4 GPU with PT 1.13 with no Flash Attention. Here, I measured the time taken to generate 25 tokens with a batch size of 1, and then averaged over 100 examples. Feel free to use this as indicative numbers for VRAM, but I would highly advise that you measure the Whisper/Distil-Whisper models on your own hardware/library versions!
Hey @treya-lin - we did indeed consider the VRAM for these models, but found it to be highly dependent on GPU hardware, CUDA version and even PyTorch version. For example, VRAM changed considerably going between PyTorch 1.13 and 2.0 for the same models on the same hardware. Therefore, we decided to quote the parameter count as a "proxy" for VRAM usage, in order to give a fair and reliable estimate for the expected memory.
To give you some idea of VRAM, here are very preliminary results I got benchmarking randomly initialised teacher/student models on a 16GB T4 GPU with PT 1.13 with no Flash Attention. Here, I measured the time taken to generate 25 tokens with a batch size of 1, and then averaged over 100 examples. Feel free to use this as indicative numbers for VRAM, but I would highly advise that you measure the Whisper/Distil-Whisper models on your own hardware/library versions!
Hi thanks for your reply. Very helpful information, thanks! I will take a look at how it works in my environment.
Hi, it would be of great help if the info of required VRAM is added in the README, just as in whisper's repo where they list the vram needed for different sizes of models (https://github.com/openai/whisper#available-models-and-languages) Thanks!