mahmoodlab / CLAM

Data-efficient and weakly supervised computational pathology on whole slide images - Nature Biomedical Engineering
http://clam.mahmoodlab.org
GNU General Public License v3.0
1.02k stars 340 forks source link

CLAM main.py failes #202

Closed eladzis closed 5 months ago

eladzis commented 1 year ago

Hello, I was trying to run your model and every thing went find until I ran main.py . This is the error: RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when callingcublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)``

Which is caused because of the this line:

File "model_clam.py", line 151, in forward
    A, h = self.attention_net(h)  # NxK   

I looked at "h" : torch.Size([26780, 1024]) tensor([[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], ..., [1.5065, 3.6833, 1.6349, ..., 0.4466, 0.9938, 0.3411], [2.1708, 2.9039, 2.8781, ..., 0.0000, 0.0000, 0.5808], [0.8070, 0.0293, 2.0469, ..., 0.0000, 0.0000, 0.5300]], device='cuda:0')

From these results I concluded that the error accord because something in the embedding process didn't work right and resulted in nan.

I would appreciate your help understanding what went wrong.

Thanks!

yuanzhang7 commented 9 months ago

I meet the same error and need help too.

ret = torch.addmm(bias, input, weight.t()) RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)

eladzis commented 9 months ago

I solved my error by updating the torch version and not use the one in the yaml file

yuanzhang7 commented 9 months ago

Could you please share your torch version and relevant settings?

fedshyvana commented 5 months ago

hey, i just bumped torch and other libraries to recent versions. Hopefully that solves torch-related issues but please let me know if there are further issues.