kuvaus / LlamaGPTJ-chat

Simple chat program for LLaMa, GPT-J, and MPT models.
MIT License
216 stars 50 forks source link

Here's how to compile and run under MINGW64 from Msys2 #23

Open greggft opened 1 year ago

greggft commented 1 year ago
Fixit@DAD MINGW64 ~/LlamaGPTJ-chat $ mkdir build (myenv) Fixit@DAD MINGW64 ~/LlamaGPTJ-chat $ cd build (myenv) Fixit@DAD MINGW64 ~/LlamaGPTJ-chat/build $ mkdir models (myenv) Fixit@DAD MINGW64 ~/LlamaGPTJ-chat/build $ cp ../../MODELS/ggml-vicuna-13b-1.1-q4_2.bin models (myenv) Fixit@DAD MINGW64 ~/LlamaGPTJ-chat/build $ cmake --fresh .. -DCMAKE_CXX_COMPILER=g++.exe -DCMAKE_C_COMPILER=gcc.exe -- The C compiler identification is GNU 12.2.0 -- The CXX compiler identification is GNU 12.2.0 System is unknown to cmake, create: Platform/MINGW64_NT-10.0-19045 to use this system, please post your config file on discourse.cmake.org so it can be added to cmake -- Detecting C compiler ABI info System is unknown to cmake, create: Platform/MINGW64_NT-10.0-19045 to use this system, please post your config file on discourse.cmake.org so it can be added to cmake -- Detecting C compiler ABI info - done -- Check for working C compiler: /mingw64/bin/gcc.exe - skipped -- Detecting C compile features -- Detecting C compile features - done -- Detecting CXX compiler ABI info System is unknown to cmake, create: Platform/MINGW64_NT-10.0-19045 to use this system, please post your config file on discourse.cmake.org so it can be added to cmake -- Detecting CXX compiler ABI info - done -- Check for working CXX compiler: /mingw64/bin/g++.exe - skipped -- Detecting CXX compile features -- Detecting CXX compile features - done -- Performing Test CMAKE_HAVE_LIBC_PTHREAD System is unknown to cmake, create: Platform/MINGW64_NT-10.0-19045 to use this system, please post your config file on discourse.cmake.org so it can be added to cmake -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success -- Found Threads: TRUE -- CMAKE_SYSTEM_PROCESSOR: unknown -- Unknown architecture -- Configuring done (25.8s) -- Generating done (0.5s) -- Build files have been written to: /home/Fixit/LlamaGPTJ-chat/build (myenv) Fixit@DAD MINGW64 ~/LlamaGPTJ-chat/build $ cmake --build . --parallel [ 8%] Building C object gpt4all-backend/llama.cpp/CMakeFiles/ggml.dir/ggml.c.obj C:/msys64/home/Fixit/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c: In function 'quantize_rowq4 ': C:/msys64/home/Fixit/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c:781:15: warning: unused variabl e 'nb' [-Wunused-variable] 781 const int nb = k / QK4_0; ^~ C:/msys64/home/Fixit/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c: In function 'quantize_rowq4 ': C:/msys64/home/Fixit/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c:1129:27: warning: unused variab le 'y' [-Wunused-variable] 1129 block_q4_1 * restrict y = vy; ^ C:/msys64/home/Fixit/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c:1127:15: warning: unused variab le 'nb' [-Wunused-variable] 1127 const int nb = k / QK4_1; ^~ C:/msys64/home/Fixit/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c: In function 'quantize_rowq8 ': C:/msys64/home/Fixit/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c:1507:15: warning: unused variab le 'nb' [-Wunused-variable] 1507 const int nb = k / QK8_1; ^~ C:/msys64/home/Fixit/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c: In function 'ggml_compute_forw ard_alibi_f32': C:/msys64/home/Fixit/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c:9357:15: warning: unused variab le 'ne2_ne3' [-Wunused-variable] 9357 const int ne2_ne3 = n/ne1; // ne2*ne3 ^~~ C:/msys64/home/Fixit/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c: In function 'ggml_compute_forw ard_alibi_f16': C:/msys64/home/Fixit/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c:9419:15: warning: unused variab le 'ne2' [-Wunused-variable] 9419 const int ne2 = src0->ne[2]; // n_head -> this is k ^~~ C:/msys64/home/Fixit/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c: In function 'ggml_compute_forw ard_alibi': C:/msys64/home/Fixit/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c:9468:5: warning: enumeration va lue 'GGML_TYPE_Q4_3' not handled in switch [-Wswitch] 9468 switch (src0->type) { ^~ [ 8%] Built target ggml [ 16%] Building CXX object gpt4all-backend/llama.cpp/CMakeFiles/llama.dir/llama.cpp.obj In file included from C:/msys64/home/Fixit/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/llama.cpp:8: C:/msys64/home/Fixit/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/llama_util.h: In constructor 'llama_mm ap::llama_mmap(llama_file*, bool)': C:/msys64/home/Fixit/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/llama_util.h:233:94: note: '#pragma me ssage: warning: You are building for pre-Windows 8; prefetch not supported' 233 #pragma message("warning: You are building for pre-Windows 8; prefetch not supported ")

^ C:/msys64/home/Fixit/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/llama_util.h:201:47: warning: unused p arameter 'prefetch' [-Wunused-parameter] 201 | llama_mmap(struct llama_file * file, bool prefetch = true) { | ~^~~~~~~ [ 25%] Linking CXX static library libllama.a [ 25%] Built target llama [ 33%] Building CXX object gpt4all-backend/CMakeFiles/llmodel.dir/llama.cpp/examples/common.cpp.obj [ 41%] Building CXX object gpt4all-backend/CMakeFiles/llmodel.dir/llmodel_c.cpp.obj [ 50%] Building CXX object gpt4all-backend/CMakeFiles/llmodel.dir/gptj.cpp.obj [ 58%] Building CXX object gpt4all-backend/CMakeFiles/llmodel.dir/llamamodel.cpp.obj [ 66%] Building CXX object gpt4all-backend/CMakeFiles/llmodel.dir/mpt.cpp.obj [ 75%] Building CXX object gpt4all-backend/CMakeFiles/llmodel.dir/utils.cpp.obj [ 83%] Linking CXX static library libllmodel.a /mingw64/bin/ar.exe qc libllmodel.a CMakeFiles/llmodel.dir/gptj.cpp.obj CMakeFiles/llmodel.dir/llamamodel.cpp.obj CMakeFiles/llmodel.dir/llama.cpp/examples/common.cpp.obj CMakeFiles/llmodel.dir/llmodel_c.cpp.obj CMakeFiles/llmodel.dir/mpt.cpp.obj CMakeFiles/llmodel.dir/utils.cpp.obj /mingw64/bin/ranlib.exe libllmodel.a [ 83%] Built target llmodel [ 91%] Building CXX object src/CMakeFiles/chat.dir/chat.cpp.obj [100%] Linking CXX executable ../bin/chat [100%] Built target chat (myenv) Fixit@DAD MINGW64 ~/LlamaGPTJ-chat/build $ bin/chat LlamaGPTJ-chat (v. 0.3.0) Your computer supports AVX2 LlamaGPTJ-chat: loading .\models\ggml-vicuna-13b-1.1-q4_2.bin llama.cpp: loading model from .\models\ggml-vicuna-13b-1.1-q4_2.bin llama_model_load_internal: format = ggjt v1 (latest) llama_model_load_internal: n_vocab = 32000 llama_model_load_internal: n_ctx = 2048 llama_model_load_internal: n_embd = 5120 llama_model_load_internal: n_mult = 256 llama_model_load_internal: n_head = 40 llama_model_load_internal: n_layer = 40 llama_model_load_internal: n_rot = 128 llama_model_load_internal: ftype = 5 (mostly Q4_2) llama_model_load_internal: n_ff = 13824 llama_model_load_internal: n_parts = 1 llama_model_load_internal: model size = 13B llama_model_load_internal: ggml ctx size = 73.73 KB llama_model_load_internal: mem required = 9807.47 MB (+ 1608.00 MB per state) llama_init_from_file: kv self size = 1600.00 MB LlamaGPTJ-chat: done loading!

hello Hello! How can I help you today? /quit (myenv) Fixit@DAD MINGW64 ~/LlamaGPTJ-chat/build $

Took 5 minutes to respond with just saying hello

UPDATE #1 $ ./bin/chat -m "models/ggml-gpt4all-j-v1.3-groovy.bin" -t 4 LlamaGPTJ-chat (v. 0.3.0) Your computer supports AVX2 LlamaGPTJ-chat: loading models/ggml-gpt4all-j-v1.3-groovy.bin gptj_model_load: loading model from 'models/ggml-gpt4all-j-v1.3-groovy.bin' - please wait ... gptj_model_load: n_vocab = 50400 gptj_model_load: n_ctx = 2048 gptj_model_load: n_embd = 4096 gptj_model_load: n_head = 16 gptj_model_load: n_layer = 28 gptj_model_load: n_rot = 64 gptj_model_load: f16 = 2 gptj_model_load: ggml ctx size = 5401.45 MB gptj_model_load: kv self size = 896.00 MB gptj_model_load: .............................................. done gptj_model_load: model size = 3609.38 MB / num tensors = 285 LlamaGPTJ-chat: done loading!

hello Hi! How can I assist you today? /quit (myenv) Fixit@DAD MINGW64 ~/LlamaGPTJ-chat/build $ Just over 2 minutes to respond to my hello

kuvaus commented 1 year ago

Nice! You got it running on Windows 7. Edit: I just noticed pre-Windows 8 so I'm assuming 7.

Looks like you didn't even need the cmake .. -G "MinGW Makefiles" part from README but I guess its because you already had MinGV gcc in (myenv).

The over 2x speed difference between 13B and 7B models is not surprising, but the fact that it takes several minutes is.

If your processor has more threads than 4, you can set -t to be a bigger number. For example with 8 core (16 threads) I would set it to, say -t 14 (its important to leave at least one thread to the OS, otherwise it will slow down a lot). If there's no -t specified, then default is 4.

But, with the way these models work, memory will always be the biggest bottleneck. This is because any large language model is (in a way) one big equation that is evaluated all at once for each token. So the entire model has to be accessible in memory for this evaluation.

For the Vicuna 13B model, looks like this

C:/msys64/home/Fixit/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/llama_util.h:201:47: warning: unused p
arameter 'prefetch' [-Wunused-parameter]
201 | llama_mmap(struct llama_file * file, bool prefetch = true) {

might indicate that mmap is not working. If you want to tinker, you can change line 55 and 59 on gpt4all-backend/llamamodel.cpp from

d_ptr->params.use_mmap   = params.use_mmap;
d_ptr->params.use_mlock  = params.use_mlock;

to

d_ptr->params.use_mmap   = false;
d_ptr->params.use_mlock  = false;

But I'm not sure if it makes any difference. Probably not. The current settings seem to work well for Windows 10 and up.

greggft commented 1 year ago

Actually I am running windows 10

OS Name Microsoft Windows 10 Pro Version 10.0.19045 Build 19045 System Manufacturer Hewlett-Packard System Model HP ZBook 15u G2 Processor Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz, 2601 Mhz, 2 Core(s), 4 Logical Processor(s)

It's the most "powerful" piece of computing I own.... Had to switch to my laptop because the OS hard drive just died on my ProxMox server :-( So all my "test" servers are "dead", luckily they are on a ZFS partition but I got to find a replacement drive now in my box of hard drives....

You can close this out if you wish but I posted this because without specifying gcc and g++ in Msys2 setup the compile fails Thanks for a great program to test AI's with I appreciate it!

kuvaus commented 1 year ago

Oh. I misread:

Platform/MINGW64_NT-10.0 does indicate windows 10. Wonder why it said pre-windows 8 on the pragma message. Probably some MinGW thing.

Oh, and set -t 2 or -t 3 so that you get 1 thread free for the OS. That should absolutely speed things up a bit!

You can close this out if you wish but I posted this because without specifying gcc and g++ in Msys2 setup the compile fails Thanks for a great program to test AI's with I appreciate it!

This is great info for others. Better to leave it up! I didn't know it would not compile without setting DCMAKE_CXX_COMPILER and DCMAKE_C_COMPILER.

Thanks a lot for this! :)

pranitsh commented 6 months ago

I came across a --config Release, but it didn't fix the issue with speed (took too long to load the model even).

Found the idea from the below: gpt4all/gpt4all-bindings/csharp/build_win-mingw.ps1 https://github.com/nomic-ai/gpt4all/blob/1b84a48c47a382dfa432dbf477a7234402a0f76c/gpt4all-bindings/csharp/build_win-mingw.ps1#L4

I'm running

mkdir build
cd build
cmake -G "MinGW Makefiles" .. -DAVX512=ON
cmake --build . --parallel --config Release

I'm not too familiar with CMake. Any suggestions? Clarification: Is it an issue with missing a flag? I couldn't find the dll's in question later in the .sh file in the link above. Any approaches to try for this problem?

kuvaus commented 6 months ago

Hi,

Thanks for the link. Interesting.

The project uses static linking which means that the *dll files are in the .exe already. This was because I didnt want users to have worry about copying those dlls and having them at correct paths.

But if you want to build the dll files, then you can set the flag: cmake -DBUILD_SHARED_LIBS=ON

and you might need to also edit the CMakeLists.txt line 105 to remove the -static references:

set(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS}")

If you made the dir mkdir build like you did, then the *dll files should be in build/gpt4all-backend.

I have found that using the gpt4all backend instead of pure llama.cpp is indeed a bit slower.