error when loading a model

faassen commented 1 year ago

I fixed main.rs to refer to &args.model_path, but now I get a new error:

Could not load model: invalid utf-8 sequence of 1 bytes from index 0

I created these models using the tools in llama.cpp, but they don't seem to be compatible?

philpax commented 1 year ago

Can you show the sequence of commands you ran? I was able to get it working with my 4-bit quantized 7B.

setzer22 commented 1 year ago

Same for me. I've been able to load all the f16 and q4 versions (those that fit in my RAM, at least), the same ones I was using with llama.cpp.

A quick thing to try, beyond sharing the commands you executed, would be to check if the same model loads well under llama.cpp. Then we could confirm a bug here.

mwbryant commented 1 year ago

I am having the same problem, the error seems to happen on https://github.com/setzer22/llama-rs/blob/266be12476c8a64ee98188761db6248137655201/llama-rs/src/llama.rs#L175 during iteration 132 out of 32000 for me...

I'm running cargo run --release -- -m ../llama.cpp/models/7B/ggml-model-q4_0.bin -p "They tell me" and the model works perfectly with the cpp implementation.

The previous tokens being loaded are the lowercase alphabet, {, |, }, ~, \u{7f}, and then the crash

setzer22 commented 1 year ago

Thanks for the more detailed report @mwbryant!

I guess it makes sense, C++ doesn't really care about invalid UTF-8, on the C++ side a string is just a byte array. So the code is probably silently ignoring the issue. So first, we should fix it, because this is not an irrecoverable error. What needs to be done there, is the following:

Make read-string return a recognizable error on invalid UTF-8.
When parsing words (and only there), use a match statement to capture that error, and replace the word with something like "�".
When that happens, also print a warning. It shouldn't happen. I'm able to load that same model just fine.

I don't have time to work on this right now (at least in a few days), but PRs are very much welcome! :) Should be an easy first contribution for anyone interested.

Then, it would be good to figure out how that invalid utf8 got in there in the first place. Can you run a sha256sum on your model file (the same one you used in your example, 7B/q4, and share it here, to ensure the file is not somehow slightly corrupt?

Then, some platform information might also help diagnose this. What OS / version / whatever(?) are you on? :smile:

mwbryant commented 1 year ago

I opened a PR with that solution and the program now works perfectly on my machine! Warning appears on all tokens from 131-258, any one with a working model should be able to print those out and see what the tokens parse to on working machines and that might also solve the mystery.

I'm on Ubuntu 20.04 LTS

sha256sum ../../llama.cpp/models/7B/ggml-model-q4_0.bin f495fa02a0b5ef265e1864d9680eede7fd23a60b0a2f93edba8091e2a4ca68b9 ../../llama.cpp/models/7B/ggml-model-q4_0.bin

Rust version: rustc 1.67.0 (fc594f156 2023-01-24)

Anything else you want to know?

setzer22 commented 1 year ago

Ok, that's odd. All those tokens print to me as �. I tried printing the string bytes as hex and they're all exactly EF BF BD, which is exactly the replacement character, i.e. �, not something else https://apps.timwhitlock.info/unicode/inspect?s=%EF%BF%BD.

It's likely that the python script I used to convert the model into ggml weights (from llama.cpp) already replaced those unprintable characters, but it didn't for you for some reason.

The hash for my model is not the same: 558a38f1d9ae25859f52df134d1103c8a2ff337afd64e8b1b8e5c06d7081daff

Anyway, thanks a lot for the PR, I'll have a look! :smile:

mwbryant commented 1 year ago

Ah so maybe a minor different in the python package version or something impossible to detect. Well the hotfix then should never effect the end user behavior so I think it's safe to forget it for now then :)

rustformers / llm

error when loading a model #3