Closed huangruizhe closed 1 year ago
which version of cuda and pytorch are you using?
CUDA 11.1 and Pytorch 1.10.0
Could you switch to another cuda version, e.g., cuda 10.2?
RuntimeError: CUDA error: invalid configuration argument
Most people are using cuda 11.1 when they have such an issue.
Sure, I will try. Thanks for the suggestion!
For future reference, the following issues are related to this one using cuda 11.1
Looks like this is most likely a PyTorch bug that we just happen to be triggering, so probably would be easiest to try different versions of PyTorch and/or CUDA because we would not be able to fix this ourselves.
After we switch to CUDA 10.2, the issue is resolved. Thanks a lot!
(We can use --max-duration 600
and GPU memory utilization is very good.)
When I was trying a zipformer (pruned_transducer_stateless7) on spgispeech, I did the following:
I got the following error after the training run for a while:
It seems not an OOM error. If setting
--max-duration 300
, this error can happen at batch 50. On the other hand, if I try--max-duration 100
as default, it goes well after many batches but the GPU memory usage is very low. Do you know what may be the issue?