Closed faassen closed 1 year ago
Can you show the sequence of commands you ran? I was able to get it working with my 4-bit quantized 7B.
Same for me. I've been able to load all the f16 and q4 versions (those that fit in my RAM, at least), the same ones I was using with llama.cpp.
A quick thing to try, beyond sharing the commands you executed, would be to check if the same model loads well under llama.cpp. Then we could confirm a bug here.
I am having the same problem, the error seems to happen on https://github.com/setzer22/llama-rs/blob/266be12476c8a64ee98188761db6248137655201/llama-rs/src/llama.rs#L175 during iteration 132 out of 32000 for me...
I'm running cargo run --release -- -m ../llama.cpp/models/7B/ggml-model-q4_0.bin -p "They tell me"
and the model works perfectly with the cpp implementation.
The previous tokens being loaded are the lowercase alphabet, {, |, }, ~, \u{7f}, and then the crash
Thanks for the more detailed report @mwbryant!
I guess it makes sense, C++ doesn't really care about invalid UTF-8, on the C++ side a string is just a byte array. So the code is probably silently ignoring the issue. So first, we should fix it, because this is not an irrecoverable error. What needs to be done there, is the following:
match
statement to capture that error, and replace the word with something like "�".I don't have time to work on this right now (at least in a few days), but PRs are very much welcome! :) Should be an easy first contribution for anyone interested.
Then, it would be good to figure out how that invalid utf8 got in there in the first place. Can you run a sha256sum
on your model file (the same one you used in your example, 7B/q4, and share it here, to ensure the file is not somehow slightly corrupt?
Then, some platform information might also help diagnose this. What OS / version / whatever(?) are you on? :smile:
I opened a PR with that solution and the program now works perfectly on my machine! Warning appears on all tokens from 131-258, any one with a working model should be able to print those out and see what the tokens parse to on working machines and that might also solve the mystery.
I'm on Ubuntu 20.04 LTS
sha256sum ../../llama.cpp/models/7B/ggml-model-q4_0.bin f495fa02a0b5ef265e1864d9680eede7fd23a60b0a2f93edba8091e2a4ca68b9 ../../llama.cpp/models/7B/ggml-model-q4_0.bin
Rust version: rustc 1.67.0 (fc594f156 2023-01-24)
Anything else you want to know?
Ok, that's odd. All those tokens print to me as �. I tried printing the string bytes as hex and they're all exactly EF BF BD, which is exactly the replacement character, i.e. �, not something else https://apps.timwhitlock.info/unicode/inspect?s=%EF%BF%BD.
It's likely that the python script I used to convert the model into ggml weights (from llama.cpp) already replaced those unprintable characters, but it didn't for you for some reason.
The hash for my model is not the same: 558a38f1d9ae25859f52df134d1103c8a2ff337afd64e8b1b8e5c06d7081daff
Anyway, thanks a lot for the PR, I'll have a look! :smile:
Ah so maybe a minor different in the python package version or something impossible to detect. Well the hotfix then should never effect the end user behavior so I think it's safe to forget it for now then :)
I fixed
main.rs
to refer to&args.model_path
, but now I get a new error:I created these models using the tools in llama.cpp, but they don't seem to be compatible?