GGUF support - Githubissues

philpax commented 1 year ago

Implements support for loading and saving GGUF support.

TODO:

[x] Implement a basic GGUF format loader that loads the file in as a struct.
[x] Load a Llama model with a remote Hugging Face tokenizer.
- [ ] Load a Llama model with an embedded tokenizer (GGML).
- [x] Load a Llama model with an embedded tokenizer (HF).
- [x] Ensure that models with a HF tokenizer can operate without a GGML tokenizer.
[x] Implement a basic GGUF format saver that saves a Gguf struct to a file.
- [ ] Reimplement quantize. (For extra points, make it multithreaded.)
- [x] Think about how hyperparameters should generally be treated. Should a model re-write them to a Metadata map?
[x] Make sure all llm metadata values are used for llama.
[ ] Fix all the models.
- [ ] Fix BLOOM.
- [ ] Fix Falcon.
- [ ] Fix GPT-2.
- [ ] Fix GPT-J.
- [ ] Fix GPT-NeoX.
- [ ] Fix MPT.
[ ] Remove all of the expects.
[x] Remove the architecture option and load entirely based on the architecture specified in the GGUF.

Open questions:

Should we still support the old formats?
- No. Instead, we'll take the old code and build a converter with it. Preferably one that can ingest HF files to provide the necessary information for a fully-compliant GGUF.
How resilient should we be to malformed GGUF models?
- Answer: The usual Rust standard. Don't panic if you can avoid it.

Closes #365.

svenstaro commented 1 year ago

I think having a migration tool for converting previous formats to GGUF and then removing support for other models might be the most maintainable solution. It might be too early to definitely call this but I think it's prudent to assume that the ecosystem will converge on GGUF as the preferred format soon.

KerfuffleV2 commented 1 year ago

I've been messing around cleaning up the Python scripts in llama.cpp (like the converters, Python side of GGUF) so if you need to pick someone's brain about GGUF stuff I might be able to help. I'm not a expert by any means.

philpax commented 1 year ago

Aye, I noticed you contributed the conversion script upstream; I'll definitely reach out if I have any questions about the specifics there.

rustformers / llm

GGUF support #412