Closed eladzis closed 5 months ago
I meet the same error and need help too.
ret = torch.addmm(bias, input, weight.t())
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)
I solved my error by updating the torch version and not use the one in the yaml file
Could you please share your torch version and relevant settings?
hey, i just bumped torch and other libraries to recent versions. Hopefully that solves torch-related issues but please let me know if there are further issues.
Hello, I was trying to run your model and every thing went find until I ran main.py . This is the error:
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling
cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)``Which is caused because of the this line:
I looked at "h" : torch.Size([26780, 1024]) tensor([[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], ..., [1.5065, 3.6833, 1.6349, ..., 0.4466, 0.9938, 0.3411], [2.1708, 2.9039, 2.8781, ..., 0.0000, 0.0000, 0.5808], [0.8070, 0.0293, 2.0469, ..., 0.0000, 0.0000, 0.5300]], device='cuda:0')
From these results I concluded that the error accord because something in the embedding process didn't work right and resulted in nan.
I would appreciate your help understanding what went wrong.
Thanks!