huggingface / candle

Minimalist ML framework for Rust
Apache License 2.0
15.66k stars 931 forks source link

Error: cannot find tensor weight #1630

Open 555cider opened 8 months ago

555cider commented 8 months ago

FYI, the model runs successfully when I use the candle_transformers::models::quantized_llama::ModelWeights::from_gguf() method.

dvx commented 7 months ago

Having the same problem, how exactly did you get it running @555cider? I don't have much experience with candle and I feel like I'm just poking blindly.

EricLBuehler commented 7 months ago

The quantized_mistral example does not support using arbitrary GGUF weight files it appears. As such, I switched to the quantized example which fixed this.

555cider commented 7 months ago

I asked a question in the Discord Hugging Face Candle channel a while ago. I'm not sure if this will be helpful, but I'll leave it here just in case.

Zermelo Fraenkel: You probably want to use the quantized example rather than the mistral one (even quantized). The quantized example uses a model that is based on the llama.cpp implementation and so should be compatible with most gguf that you can find on the web, whereas the mistral example is a custom model that matches the original mistral implementation and is not compatible with llama.cpp weights, only with custom gguf files. In the quantized example, you can use --which 7b-mistral-instruct-v0.2 and that should use the gguf file that you mentioned.

me: Thank you for your answer!! But I was wondering if the methods in candle_transformers::models::quantized_mistral are strange. I saw that it had quantized in the name, so I thought it would work with the GGUF model, but it didn't.

Zermelo Fraenkel: Just to reformulate what I said earlier quantized_mistral is based on the mistral model and is only compatible with specific gguf files that have been built for it whereas quantized is based on the llama.cpp conventions so would work with most gguf files that have been built for llama.cpp.

555cider commented 7 months ago

As far as I understand, the candle_transformers::models::quantized_mistral method is only applicable to specific gguf files, not general gguf files (created by The Bloke). I don't understand why such a method, which can only be used on non-universal files, was merged...