mit-han-lab / TinyChatEngine

TinyChatEngine: On-Device LLM Inference Library
https://mit-han-lab.github.io/TinyChatEngine/
MIT License
623 stars 58 forks source link

make chat undefined reference to `LLaVAGenerate #95

Open cuu opened 4 months ago

cuu commented 4 months ago

On Jetson orin nano 8G

when make chat

(TinyChatEngine) cpi@ubuntu:~/github/mit-han-lab/TinyChatEngine/llm$ make chat
CUDA is available!
src/Generate.cc src/GPTBigCodeGenerate.cc src/GPTBigCodeTokenizer.cc src/LLaMATokenizer.cc src/OPTGenerate.cc src/OPTTokenizer.cc src/utils.cc src/nn_modules/Fp32CLIPAttention.cc src/nn_modules/Fp32CLIPEncoder.cc src/nn_modules/Fp32CLIPEncoderLayer.cc src/nn_modules/Fp32CLIPVisionTransformer.cc src/nn_modules/Fp32GPTBigCodeAttention.cc src/nn_modules/Fp32GPTBigCodeDecoder.cc src/nn_modules/Fp32GPTBigCodeDecoderLayer.cc src/nn_modules/Fp32GPTBigCodeForCausalLM.cc src/nn_modules/Fp32llamaAttention.cc src/nn_modules/Fp32llamaDecoder.cc src/nn_modules/Fp32llamaDecoderLayer.cc src/nn_modules/Fp32llamaForCausalLM.cc src/nn_modules/Fp32OPTAttention.cc src/nn_modules/Fp32OPTDecoder.cc src/nn_modules/Fp32OPTDecoderLayer.cc src/nn_modules/Fp32OPTForCausalLM.cc src/nn_modules/Int4GPTBigCodeAttention.cc src/nn_modules/Int4GPTBigCodeDecoder.cc src/nn_modules/Int4GPTBigCodeDecoderLayer.cc src/nn_modules/Int4GPTBigCodeForCausalLM.cc src/nn_modules/Int4OPTAttention.cc src/nn_modules/Int4OPTDecoder.cc src/nn_modules/Int4OPTDecoderLayer.cc src/nn_modules/Int4OPTForCausalLM.cc src/nn_modules/Int8OPTAttention.cc src/nn_modules/Int8OPTDecoder.cc src/nn_modules/Int8OPTDecoderLayer.cc src/nn_modules/OPTForCausalLM.cc src/ops/arg_max.cc src/ops/batch_add.cc src/ops/BMM_F32T.cc src/ops/BMM_S8T_S8N_F32T.cc src/ops/BMM_S8T_S8N_S8T.cc src/ops/Conv2D.cc src/ops/embedding.cc src/ops/Gelu.cc src/ops/LayerNorm.cc src/ops/LayerNormQ.cc src/ops/linear.cc src/ops/LlamaRMSNorm.cc src/ops/RotaryPosEmb.cc src/ops/softmax.cc src/ops/W8A8B8O8Linear.cc src/ops/W8A8B8O8LinearReLU.cc src/ops/W8A8BFP32OFP32Linear.cc ../kernels/matmul_imp.cc ../kernels/matmul_int4.cc ../kernels/matmul_int8.cc ../kernels/pthread_pool.cc
../kernels/cuda/matmul_ref_fp32.cc ../kernels/cuda/matmul_ref_int8.cc
../kernels/cuda/gemv_cuda.cu ../kernels/cuda/matmul_int4.cu  src/nn_modules/cuda/Int4llamaAttention.cu src/nn_modules/cuda/Int4llamaDecoder.cu src/nn_modules/cuda/Int4llamaDecoderLayer.cu src/nn_modules/cuda/Int4llamaForCausalLM.cu src/nn_modules/cuda/LLaMAGenerate.cu src/nn_modules/cuda/utils.cu src/ops/cuda/batch_add.cu src/ops/cuda/BMM_F16T.cu src/ops/cuda/embedding.cu src/ops/cuda/linear.cu src/ops/cuda/LlamaRMSNorm.cu src/ops/cuda/RotaryPosEmb.cu src/ops/cuda/softmax.cu
/usr/local/cuda/bin/nvcc -std=c++17 -Xptxas -O3 -gencode arch=compute_87,code=sm_87 --forward-unknown-to-host-compiler -Xcompiler "-pthread" -DQM_CUDA -DENABLE_BF16 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math --threads=8 -fPIC -I../kernels -I./include -I./include/nn_modules -I./json/single_include/ -I./half-2.2.0/include/ -I./include/ops/cuda -I/usr/local/cuda/include -I/usr/local/cuda/targets/aarch64-linux/include -I/usr/include/aarch64-linux-gnu -o chat application/chat.cc build/transformer/src/Generate.o build/transformer/src/GPTBigCodeGenerate.o build/transformer/src/GPTBigCodeTokenizer.o build/transformer/src/LLaMATokenizer.o build/transformer/src/OPTGenerate.o build/transformer/src/OPTTokenizer.o build/transformer/src/utils.o build/transformer/src/nn_modules/Fp32CLIPAttention.o build/transformer/src/nn_modules/Fp32CLIPEncoder.o build/transformer/src/nn_modules/Fp32CLIPEncoderLayer.o build/transformer/src/nn_modules/Fp32CLIPVisionTransformer.o build/transformer/src/nn_modules/Fp32GPTBigCodeAttention.o build/transformer/src/nn_modules/Fp32GPTBigCodeDecoder.o build/transformer/src/nn_modules/Fp32GPTBigCodeDecoderLayer.o build/transformer/src/nn_modules/Fp32GPTBigCodeForCausalLM.o build/transformer/src/nn_modules/Fp32llamaAttention.o build/transformer/src/nn_modules/Fp32llamaDecoder.o build/transformer/src/nn_modules/Fp32llamaDecoderLayer.o build/transformer/src/nn_modules/Fp32llamaForCausalLM.o build/transformer/src/nn_modules/Fp32OPTAttention.o build/transformer/src/nn_modules/Fp32OPTDecoder.o build/transformer/src/nn_modules/Fp32OPTDecoderLayer.o build/transformer/src/nn_modules/Fp32OPTForCausalLM.o build/transformer/src/nn_modules/Int4GPTBigCodeAttention.o build/transformer/src/nn_modules/Int4GPTBigCodeDecoder.o build/transformer/src/nn_modules/Int4GPTBigCodeDecoderLayer.o build/transformer/src/nn_modules/Int4GPTBigCodeForCausalLM.o build/transformer/src/nn_modules/Int4OPTAttention.o build/transformer/src/nn_modules/Int4OPTDecoder.o build/transformer/src/nn_modules/Int4OPTDecoderLayer.o build/transformer/src/nn_modules/Int4OPTForCausalLM.o build/transformer/src/nn_modules/Int8OPTAttention.o build/transformer/src/nn_modules/Int8OPTDecoder.o build/transformer/src/nn_modules/Int8OPTDecoderLayer.o build/transformer/src/nn_modules/OPTForCausalLM.o build/transformer/src/ops/arg_max.o build/transformer/src/ops/batch_add.o build/transformer/src/ops/BMM_F32T.o build/transformer/src/ops/BMM_S8T_S8N_F32T.o build/transformer/src/ops/BMM_S8T_S8N_S8T.o build/transformer/src/ops/Conv2D.o build/transformer/src/ops/embedding.o build/transformer/src/ops/Gelu.o build/transformer/src/ops/LayerNorm.o build/transformer/src/ops/LayerNormQ.o build/transformer/src/ops/linear.o build/transformer/src/ops/LlamaRMSNorm.o build/transformer/src/ops/RotaryPosEmb.o build/transformer/src/ops/softmax.o build/transformer/src/ops/W8A8B8O8Linear.o build/transformer/src/ops/W8A8B8O8LinearReLU.o build/transformer/src/ops/W8A8BFP32OFP32Linear.o build/transformer/../kernels/matmul_imp.o build/transformer/../kernels/matmul_int4.o build/transformer/../kernels/matmul_int8.o build/transformer/../kernels/pthread_pool.o build/transformer/../kernels/cuda/matmul_ref_fp32.o build/transformer/../kernels/cuda/matmul_ref_int8.o build/transformer/../kernels/cuda/gemv_cuda.o build/transformer/../kernels/cuda/matmul_int4.o build/transformer/src/nn_modules/cuda/Int4llamaAttention.o build/transformer/src/nn_modules/cuda/Int4llamaDecoder.o build/transformer/src/nn_modules/cuda/Int4llamaDecoderLayer.o build/transformer/src/nn_modules/cuda/Int4llamaForCausalLM.o build/transformer/src/nn_modules/cuda/LLaMAGenerate.o build/transformer/src/nn_modules/cuda/utils.o build/transformer/src/ops/cuda/batch_add.o build/transformer/src/ops/cuda/BMM_F16T.o build/transformer/src/ops/cuda/embedding.o build/transformer/src/ops/cuda/linear.o build/transformer/src/ops/cuda/LlamaRMSNorm.o build/transformer/src/ops/cuda/RotaryPosEmb.o build/transformer/src/ops/cuda/softmax.o  -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -lnvrtc -lcuda -lcudnn -lcurand -lcusolver -L/usr/local/cuda/lib64 -L/usr/local/cuda/targets/aarch64-linux/lib -L/usr/lib/aarch64-linux-gnu -Xlinker -rpath=/usr/local/cuda/lib64 -Xlinker -rpath=/usr/local/cuda/targets/aarch64-linux/lib -Xlinker -rpath=/usr/lib/aarch64-linux-gnu
nvlink warning : Skipping incompatible '/usr/lib/aarch64-linux-gnu/libpthread.a' when searching for -lpthread
nvlink warning : Skipping incompatible '/usr/lib/aarch64-linux-gnu/libdl.a' when searching for -ldl
nvlink warning : Skipping incompatible '/usr/lib/aarch64-linux-gnu/librt.a' when searching for -lrt
/usr/bin/ld: /tmp/tmpxft_00002c81_00000000-5_chat.o: in function `main':
chat.cc:(.text+0x8548): undefined reference to `LLaVAGenerate(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, void*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, void*, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, opt_params, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, bool, bool)'
/usr/bin/ld: chat.cc:(.text+0x8a00): undefined reference to `LLaVAGenerate(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, void*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, void*, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, opt_params, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, bool, bool)'
/usr/bin/ld: chat.cc:(.text+0x9158): undefined reference to `LLaVAGenerate(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, void*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, void*, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, opt_params, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, bool, bool)'
/usr/bin/ld: chat.cc:(.text+0x9614): undefined reference to `LLaVAGenerate(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, void*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, void*, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, opt_params, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, bool, bool)'
collect2: error: ld returned 1 exit status
make: *** [Makefile:225: chat] Error 255

it seems that there is no src/nn_modules/cuda/LLaVAGenerate.cu

and by the way

src/ops/Gelu.cc needs

#include <math.h>

for tanhf, expf

git commit hash is

d0fed698b739994afda8ece0dab60cc0f22b2108 
Dudu014 commented 4 months ago

Same issue here.

Gelu.cc, I solved it adding: #include <cmath>

Regarding the LLaVAGenerate issue I just commented out those lines on chat.cc, since I am using LLaMA and not LLaVA, it should not matter.

That way I am able to "make chat -j". However, when running "./chat" it gets stucked showing "loading model ..." and the process ends showing "killed" on the screen. Unsure what the problem is, I am assuming is the "int4LlamaForCausalLM model" declaration on "chat.cc" as the program never shows the comment "Finshed!".

Arquitecture: Jetson Nano Orin Developer Kit 8GB Model: LLaMA2_7B_chat_awq_int4 for CUDA device