RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED

Error Info:

File "/home/xxx/文档/github/Segment-and-Track-Anything-main/sam/segment_anything/modeling/transformer.py", line 231, in forward attn = q @ k.permute(0, 1, 3, 2) # B x N_heads x N_tokens x N_tokens RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasGemmStridedBatchedExFix( handle, opa, opb, m, n, k, (void*)(&falpha), a, CUDA_R_16F, lda, stridea, b, CUDA_R_16F, ldb, strideb, (void*)(&fbeta), c, CUDA_R_16F, ldc, stridec, num_batches, CUDA_R_32F, CUBLAS_GEMM_DEFAULT_TENSOR_OP)

Environment:

1*RTX4090(24GB) Ubuntu 22.04 CUDA 11.7 torch 1.13.0+cu117

Description:

First I ran the demo.ipynb and got the error: File "/home/xxx/文档/github/Segment-and-Track-Anything-main/sam/segment_anything/utils/amg.py", line 53, in filter RuntimeError: CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.

Then I tried adding a line to debug:

os.environ['CUDA_LAUNCH_BLOCKING'] = '1'

At the beginning of demo.ipynb and got the error: File "/home/xxx/文档/github/Segment-and-Track-Anything-main/sam/segment_anything/modeling/transformer.py", line 231, in forward attn = q @ k.permute(0, 1, 3, 2) # B x N_heads x N_tokens x N_tokens RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasGemmStridedBatchedExFix( handle, opa, opb, m, n, k, (void*)(&falpha), a, CUDA_R_16F, lda, stridea, b, CUDA_R_16F, ldb, strideb, (void*)(&fbeta), c, CUDA_R_16F, ldc, stridec, num_batches, CUDA_R_32F, CUBLAS_GEMM_DEFAULT_TENSOR_OP)

After I ran the program, I couldn't kill the process, and it seemed that the resource on GPU could not be release. Also my PC could not run as usual, and could not even restart normally.

I would appreciate it much if you could help me with the problem!

z-x-yang / Segment-and-Track-Anything

RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED #76