microsoft / chunk-attention

MIT License
27 stars 4 forks source link

build error and runtime error #2

Open 15963064649 opened 1 month ago

15963064649 commented 1 month ago

Device: NVIDIA GeForce RTX 4090 D, Cuda compilation tools, release 12.2, V12.2.91 Build cuda_12.2.r12.2/compiler.32965470_0 gcc version 13.2.0 (Ubuntu 13.2.0-23ubuntu4) torch 2.3.0 Python 3.10 In the execution of the example code where f = host.predict_async(prompt_tokens, 32), an error occurred: RuntimeError: CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling cublasGemmEx(handle, opa, opb, m, n, k, &falpha, a, CUDA_R_16F, lda, b, CUDA_R_16F, ldb, &fbeta, c, CUDA_R_16F, ldc, CUDA_R_32F, CUBLAS_GEMM_DEFAULT_TENSOR_OP)`

15963064649 commented 1 month ago

I should have found the reason, it should be that I am a non-root user of the server, and there is a higher version of gcc on the root of the server, but it does not match my cuda toolkit, which may cause the error, but I have configured a lower version of gcc under my account and set the environment variables, and my own version of gcc is still not recognized when the project is build, and I will discuss it when I solve this error.