Does it support the new GGMLv3 quantization methods?

rustformers / llm

[Unmaintained, see README] An ecosystem of Rust libraries for working with large language models

https://docs.rs/llm/latest/llm/

Apache License 2.0

6.06k stars 351 forks source link

Does it support the new GGMLv3 quantization methods? #286

Open Exotik850 opened 1 year ago

Exotik850 commented 1 year ago

Tried using the cli application to see how far it had come from being llama-rs, and noticed that an error popped up using one of the newer WizardLM uncensored models using the GGMLv3 method,

llm llama chat --model-path .\Wizard-Vicuna-7B-Uncensored.ggmlv3.q5_1.bin
⣾ Loading model...Error:
   0: Could not load model
   1: invalid file format version 3

Backtrace omitted. Run with RUST_BACKTRACE=1 environment variable to display it.
Run with RUST_BACKTRACE=full to include source snippets.

Am I using it the wrong way or is it not supported yet?

philpax commented 1 year ago

Hi there! Yes, it's supported, but only on the latest version (main) - we haven't cut a new release yet. Hope to have that sorted soon!

Exotik850 commented 1 year ago

My apologies, should've tried the main branch instead of just trying the release 😅

philpax commented 1 year ago

No worries - I'll keep this up for now and pin it for people's reference until we get it out the door :)

arctic-hen7 commented 1 year ago

@philpax have you considered making some 0.2.0-beta.1 etc. releases on crates.io? This pattern has worked very well for some of my own projects in the past.

philpax commented 1 year ago

Hi there! Yeah, I've considered it, but the main blocker is https://github.com/rustformers/llm/issues/221 - I don't want to cut a release where the interface is going to be radically different in the next release. I'm hoping to have this all closed out within the next week or two, especially with GGUF on the horizon, but I've been quite busy.