Only works for fp32 and fp16 types so that means it isn't providing much value right now.
convert_hf_checkpoint.py can already directly generate an equivalent .pth checkpoint file without gguf format indirection.
However this PR just creates the foundation and validation that the basic fp32 and fp16 works fine. In the future, we will support running the quantized version of the gguf graph in eager.
Summary:
Only works for fp32 and fp16 types so that means it isn't providing much value right now.
convert_hf_checkpoint.py
can already directly generate an equivalent .pth checkpoint file without gguf format indirection. However this PR just creates the foundation and validation that the basic fp32 and fp16 works fine. In the future, we will support running the quantized version of the gguf graph in eager.Test Plan:
pip install gguf
git clone git@github.com:ggerganov/llama.cpp.git
python scripts/download.py --repo_id [HF-dir]
python llama.cpp/convert.py [HF-dir] --outtype f16`` which will generate [HF-dir]/ggml-model-f16.gguf
python scripts/convert_from_gguf.py --gguf_file [HF-dir]/ggml-model-f16.gguf --checkpoint_file [HF-dir]/model_gguf.pth
python generate.py --checkpoint_path [HF-dir]/model_gguf.pth --device=cpu --prompt "Hello, my name is" --max_new_tokens 20