Closed TylunasLi closed 10 months ago
在编译时指定参数
cmake -DUSE_CUDA=ON -DUSE_MMAP=ON
执行后报错:
AVX: ON AVX2: OFF AARCH64: OFF Neon FP16: OFF Neon DOT: OFF Load (200 / 200) Warmup... CUBLAS initialization failed:1
错误1为 CUBLAS_STATUS_NOT_INITIALIZED
针对上述代码进行了GDB调试:
(gdb) backtrace
#0 getFastllmCublasHandle () at /home/nlp/inference/fastllm/src/devices/cuda/fastllm-cuda.cu:24 #1 0x0000000000498ed6 in FastllmCudaBatchMatMulTransB (input0=..., input1=..., output=..., input0Spatial=2048, input1Spatial=16384, outputSpatial=16, input0Stride=128, input1Stride=128, batch=2, n=16, m=128, k=1, alpha=0.0883883461) at /home/nlp/inference/fastllm/src/devices/cuda/fastllm-cuda.cu:1720 #2 0x00000000004884d2 in fastllm::CudaMatMulTransBOp::Run(std::string const&, std::map<std::string, fastllm::Data*, std::less<std::string>, std::allocator<std::pair<std::string const, fastllm::Data*> > > const&, std::map<std::string, float, std::less<std::string>, std::allocator<std::pair<std::string const, float> > > const&, std::map<std::string, int, std::less<std::string>, std::allocator<std::pair<std::string const, int> > > const&) () at /home/nlp/inference/fastllm/src/devices/cuda/cudadevice.cpp:354 #3 0x0000000000431298 in fastllm::BaseDevice::Run (this=0xa5c720, opType=..., datas=..., floatParams=..., intParams=...) at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/char_traits.h:312 #4 0x00000000004342eb in fastllm::Executor::Run(std::string const&, std::map<std::string, fastllm::Data*, std::less<std::string>, std::allocator<std::pair<std::string const, fastllm::Data*> > > const&, std::map<std::string, float, std::less<std::string>, std::allocator<std::pair<std::string const, float> > > const&, std::map<std::string, int, std::less<std::string>, std::allocator<std::pair<std::string const, int> > > const&) () at /home/nlp/inference/fastllm/src/executor.cpp:99 #5 0x0000000000422b68 in fastllm::MatMulTransB(fastllm::Data const&, fastllm::Data const&, fastllm::Data&, float) () at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/stl_tree.h:211 #6 0x00000000004546a4 in fastllm::ChatGLMModel::ForwardBatch(int, fastllm::Data const&, fastllm::Data const&, fastllm::Data const&, std::vector<std::pair<fastllm::Data, fastllm::Data>, std::allocator<std::pair<fastllm::Data, fastllm::Data> > >&, fastllm::GenerationConfig const&, fastllm::LastTokensManager const&, std::vector<std::vector<float, std::allocator<float> >*, std::allocator<std::vector<float, std::allocator<float> >*> >*) () at /home/nlp/inference/fastllm/src/models/chatglm.cpp:230 #7 0x000000000044b3d3 in fastllm::ChatGLMModel::Forward(fastllm::Data const&, fastllm::Data const&, fastllm::Data const&, std::vector<std::pair<fastllm::Data, fastllm::Data>, std::allocator<std::pair<fastllm::Data, fastllm::Data> > >&, fastllm::GenerationConfig const&, fastllm::LastTokensManager const&, std::vector<float, std::allocator<float> >*) () at /home/nlp/inference/fastllm/src/models/chatglm.cpp:77 #8 0x000000000044e018 in fastllm::ChatGLMModel::WarmUp() () at /home/nlp/inference/fastllm/src/models/chatglm.cpp:876 #9 0x0000000000431dbd in fastllm::CreateLLMModelFromFile(std::string const&) () at /home/nlp/inference/fastllm/src/model.cpp:91 #10 0x0000000000416251 in main () at /home/nlp/inference/fastllm/main.cpp:64 #11 0x00007fffed771555 in __libc_start_main () from /lib64/libc.so.6 #12 0x0000000000416abe in _start () at /home/nlp/inference/fastllm/main.cpp:98
(gdb) frame 1
#1 0x0000000000498ed6 in FastllmCudaBatchMatMulTransB (input0=..., input1=..., output=..., input0Spatial=2048, input1Spatial=16384, outputSpatial=16, input0Stride=128, input1Stride=128, batch=2, n=16, m=128, k=1, alpha=0.0883883461) at /home/nlp/inference/fastllm/src/devices/cuda/fastllm-cuda.cu:1720 1830 auto fastllmCublasHandle = getFastllmCublasHandle(); (gdb) print cudaInput0 $5 = (float *) 0x7ffca0611000 (gdb) print cudaInput1 $6 = (float *) 0x0 (gdb) print cudaOutput $7 = (float *) 0x0
问题应该是 CudaMemcpy() 不支持从mmap的内存地址进行拷贝。
CudaMemcpy()
不过,使用MMAP后,权重需要先拷贝到内存,再拷贝到GPU,加载j较慢,算是以时间换空间了。
测试环境:
问题:
在编译时指定参数
执行后报错:
错误1为 CUBLAS_STATUS_NOT_INITIALIZED
针对上述代码进行了GDB调试:
(gdb) backtrace
(gdb) frame 1
问题应该是
CudaMemcpy()
不支持从mmap的内存地址进行拷贝。