Open greggft opened 1 year ago
Nice! You got it running on Windows 7. Edit: I just noticed pre-Windows 8 so I'm assuming 7.
Looks like you didn't even need the cmake .. -G "MinGW Makefiles"
part from README but I guess its because you already had MinGV gcc in (myenv).
The over 2x speed difference between 13B and 7B models is not surprising, but the fact that it takes several minutes is.
If your processor has more threads than 4, you can set -t
to be a bigger number. For example with 8 core (16 threads) I would set it to, say -t 14
(its important to leave at least one thread to the OS, otherwise it will slow down a lot). If there's no -t specified, then default is 4.
But, with the way these models work, memory will always be the biggest bottleneck. This is because any large language model is (in a way) one big equation that is evaluated all at once for each token. So the entire model has to be accessible in memory for this evaluation.
For the Vicuna 13B model, looks like this
C:/msys64/home/Fixit/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/llama_util.h:201:47: warning: unused p
arameter 'prefetch' [-Wunused-parameter]
201 | llama_mmap(struct llama_file * file, bool prefetch = true) {
might indicate that mmap is not working. If you want to tinker, you can change line 55 and 59 on gpt4all-backend/llamamodel.cpp
from
d_ptr->params.use_mmap = params.use_mmap;
d_ptr->params.use_mlock = params.use_mlock;
to
d_ptr->params.use_mmap = false;
d_ptr->params.use_mlock = false;
But I'm not sure if it makes any difference. Probably not. The current settings seem to work well for Windows 10 and up.
Actually I am running windows 10
OS Name Microsoft Windows 10 Pro Version 10.0.19045 Build 19045 System Manufacturer Hewlett-Packard System Model HP ZBook 15u G2 Processor Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz, 2601 Mhz, 2 Core(s), 4 Logical Processor(s)
It's the most "powerful" piece of computing I own.... Had to switch to my laptop because the OS hard drive just died on my ProxMox server :-( So all my "test" servers are "dead", luckily they are on a ZFS partition but I got to find a replacement drive now in my box of hard drives....
You can close this out if you wish but I posted this because without specifying gcc and g++ in Msys2 setup the compile fails Thanks for a great program to test AI's with I appreciate it!
Oh. I misread:
Platform/MINGW64_NT-10.0 does indicate windows 10. Wonder why it said pre-windows 8 on the pragma message. Probably some MinGW thing.
Oh, and set -t 2 or -t 3 so that you get 1 thread free for the OS. That should absolutely speed things up a bit!
You can close this out if you wish but I posted this because without specifying gcc and g++ in Msys2 setup the compile fails Thanks for a great program to test AI's with I appreciate it!
This is great info for others. Better to leave it up! I didn't know it would not compile without setting DCMAKE_CXX_COMPILER and DCMAKE_C_COMPILER.
Thanks a lot for this! :)
I came across a --config Release
, but it didn't fix the issue with speed (took too long to load the model even).
Found the idea from the below: gpt4all/gpt4all-bindings/csharp/build_win-mingw.ps1 https://github.com/nomic-ai/gpt4all/blob/1b84a48c47a382dfa432dbf477a7234402a0f76c/gpt4all-bindings/csharp/build_win-mingw.ps1#L4
I'm running
mkdir build
cd build
cmake -G "MinGW Makefiles" .. -DAVX512=ON
cmake --build . --parallel --config Release
I'm not too familiar with CMake. Any suggestions?
Clarification: Is it an issue with missing a flag? I couldn't find the dll's in question later in the .sh
file in the link above. Any approaches to try for this problem?
Hi,
Thanks for the link. Interesting.
The project uses static linking which means that the *dll files are in the .exe already. This was because I didnt want users to have worry about copying those dlls and having them at correct paths.
But if you want to build the dll files, then you can set the flag:
cmake -DBUILD_SHARED_LIBS=ON
and you might need to also edit the CMakeLists.txt line 105 to remove the -static references:
set(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS}")
If you made the dir mkdir build
like you did, then the *dll files should be in build/gpt4all-backend
.
I have found that using the gpt4all backend instead of pure llama.cpp is indeed a bit slower.
~~ C:/msys64/home/Fixit/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c: In function 'ggml_compute_forw ard_alibi_f16': C:/msys64/home/Fixit/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c:9419:15: warning: unused variab le 'ne2' [-Wunused-variable] 9419~[ 8%] Built target ggml [ 16%] Building CXX object gpt4all-backend/llama.cpp/CMakeFiles/llama.dir/llama.cpp.obj In file included from C:/msys64/home/Fixit/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/llama.cpp:8: C:/msys64/home/Fixit/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/llama_util.h: In constructor 'llama_mm ap::llama_mmap(llama_file*, bool)': C:/msys64/home/Fixit/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/llama_util.h:233:94: note: '#pragma me ssage: warning: You are building for pre-Windows 8; prefetch not supported' 233^ C:/msys64/home/Fixit/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/llama_util.h:201:47: warning: unused p arameter 'prefetch' [-Wunused-parameter] 201 | llama_mmap(struct llama_file * file, bool prefetch = true) { |
~^~~~~~~ [ 25%] Linking CXX static library libllama.a [ 25%] Built target llama [ 33%] Building CXX object gpt4all-backend/CMakeFiles/llmodel.dir/llama.cpp/examples/common.cpp.obj [ 41%] Building CXX object gpt4all-backend/CMakeFiles/llmodel.dir/llmodel_c.cpp.obj [ 50%] Building CXX object gpt4all-backend/CMakeFiles/llmodel.dir/gptj.cpp.obj [ 58%] Building CXX object gpt4all-backend/CMakeFiles/llmodel.dir/llamamodel.cpp.obj [ 66%] Building CXX object gpt4all-backend/CMakeFiles/llmodel.dir/mpt.cpp.obj [ 75%] Building CXX object gpt4all-backend/CMakeFiles/llmodel.dir/utils.cpp.obj [ 83%] Linking CXX static library libllmodel.a /mingw64/bin/ar.exe qc libllmodel.a CMakeFiles/llmodel.dir/gptj.cpp.obj CMakeFiles/llmodel.dir/llamamodel.cpp.obj CMakeFiles/llmodel.dir/llama.cpp/examples/common.cpp.obj CMakeFiles/llmodel.dir/llmodel_c.cpp.obj CMakeFiles/llmodel.dir/mpt.cpp.obj CMakeFiles/llmodel.dir/utils.cpp.obj /mingw64/bin/ranlib.exe libllmodel.a [ 83%] Built target llmodel [ 91%] Building CXX object src/CMakeFiles/chat.dir/chat.cpp.obj [100%] Linking CXX executable ../bin/chat [100%] Built target chat (myenv) Fixit@DAD MINGW64 ~/LlamaGPTJ-chat/build $ bin/chat LlamaGPTJ-chat (v. 0.3.0) Your computer supports AVX2 LlamaGPTJ-chat: loading .\models\ggml-vicuna-13b-1.1-q4_2.bin llama.cpp: loading model from .\models\ggml-vicuna-13b-1.1-q4_2.bin llama_model_load_internal: format = ggjt v1 (latest) llama_model_load_internal: n_vocab = 32000 llama_model_load_internal: n_ctx = 2048 llama_model_load_internal: n_embd = 5120 llama_model_load_internal: n_mult = 256 llama_model_load_internal: n_head = 40 llama_model_load_internal: n_layer = 40 llama_model_load_internal: n_rot = 128 llama_model_load_internal: ftype = 5 (mostly Q4_2) llama_model_load_internal: n_ff = 13824 llama_model_load_internal: n_parts = 1 llama_model_load_internal: model size = 13B llama_model_load_internal: ggml ctx size = 73.73 KB llama_model_load_internal: mem required = 9807.47 MB (+ 1608.00 MB per state) llama_init_from_file: kv self size = 1600.00 MB LlamaGPTJ-chat: done loading!Took 5 minutes to respond with just saying hello
UPDATE #1 $ ./bin/chat -m "models/ggml-gpt4all-j-v1.3-groovy.bin" -t 4 LlamaGPTJ-chat (v. 0.3.0) Your computer supports AVX2 LlamaGPTJ-chat: loading models/ggml-gpt4all-j-v1.3-groovy.bin gptj_model_load: loading model from 'models/ggml-gpt4all-j-v1.3-groovy.bin' - please wait ... gptj_model_load: n_vocab = 50400 gptj_model_load: n_ctx = 2048 gptj_model_load: n_embd = 4096 gptj_model_load: n_head = 16 gptj_model_load: n_layer = 28 gptj_model_load: n_rot = 64 gptj_model_load: f16 = 2 gptj_model_load: ggml ctx size = 5401.45 MB gptj_model_load: kv self size = 896.00 MB gptj_model_load: .............................................. done gptj_model_load: model size = 3609.38 MB / num tensors = 285 LlamaGPTJ-chat: done loading!