Closed thijse closed 1 year ago
Hi,
I have not tested this myself on Windows as I don't have Windows dev environment.
Here is how it looks like for me
~/bert.cpp/build$ du ../models/all-MiniLM-L6-v2/ggml-model-q4_0.bin
14196 ../models/all-MiniLM-L6-v2/ggml-model-q4_0.bin
~/bert.cpp/build$ md5sum ../models/all-MiniLM-L6-v2/ggml-model-q4_0.bin
e2476234b52c82fe31031528f3306c9b ../models/all-MiniLM-L6-v2/ggml-model-q4_0.bin
~/bert.cpp/build$ ./bin/server -m ../models/all-MiniLM-L6-v2/ggml-model-q4_0.bin
bert_load_from_file: loading model from '../models/all-MiniLM-L6-v2/ggml-model-q4_0.bin' - please wait ...
bert_load_from_file: n_vocab = 30522
bert_load_from_file: n_max_tokens = 512
bert_load_from_file: n_embd = 384
bert_load_from_file: n_intermediate = 1536
bert_load_from_file: n_head = 12
bert_load_from_file: n_layer = 6
bert_load_from_file: f16 = 2
bert_load_from_file: ggml ctx size = 13.57 MB
bert_load_from_file: ............ done
bert_load_from_file: model size = 13.55 MB / num tensors = 101
bert_load_from_file: mem_per_token 452 KB, mem_per_input 248 MB
Server running on port 8080 with 6 threads
Waiting for a client
Also can you check the ggml version?
~/bert.cpp$ git submodule
1a5d5f331de1d3c7ace40d86fe2373021a42f9ce ggml (heads/master-85-g1a5d5f3)
My hunch is that either: 1) string stuff at bert:590 is platform specific and breaks down on windows
2) it's trying to load wrong amount of bytes when reading the tensor data.
I.e.
fin.read(reinterpret_cast<char *>(tensor->data), ggml_nbytes(tensor));
on line bert.cpp:660, ggml_nbytes returns different number for some platform reason.
Hi,
Thanks for your help! The file seems in order:
D:\GitHub\LLM\bert.cpp\models\all-MiniLM-L6-v2>md5sum.exe ggml-model-q4_0.bin
e2476234b52c82fe31031528f3306c9b *ggml-model-q4_0.bin
and
D:\GitHub\LLM\bert.cpp>git submodule
1a5d5f331de1d3c7ace40d86fe2373021a42f9ce ggml (1a5d5f3)
The string stuff at 590: at least loads a seemingly correct string: embeddings.word_embeddings.weight
of 33 characters.
ggml_nbytes(tensor)
returns 6592752
, I'm not sure if it correcf, but it is at least divisible by 8.
fin.read(reinterpret_cast<char *>(tensor->data), ggml_nbytes(tensor));
This is indeed the last read before the loop and afterwards things go boom, so my guess is that this should be the culprit. Is there any way I can check the integrity of the tensor?
I think I found the issue on my end. Can you re-download the quantized model and try again?
Thanks!
Yes! I need some free time to test the network, but the model is now loading fine! Thanks a lot!
First of all, thanks for your work! I have trouble loading the weights in 'ggml-model-q4_0.bin' model (windows, Visual Studio 2022 )
In the second loop of
all 3 parameter give nonsense values.
Also if, I look at the output on my machine
ggml ctx size is not the same as in your build example
Did the data model change? I would be gratefull if you could help me debug this