Closed johndpope closed 7 months ago
It does! We've tested on a custom fine-tuned llama2-7b model (remyxai/ffmperative-7b hosted on huggingface). Both models (llama 1 & 2) use the same HF code.
guidance.llm = guidance.llms.transformers.LLaMA("remyxai/ffmperative-7b", device_map="auto")
https://github.com/QuangBK/localLLM_guidance
@QuangBK - can you help get this working with llama2?
UPDATE - this fork by @fullstackwebdev is much better / though needs llama2 updates. 2 checkpoint. https://github.com/fullstackwebdev/localLLM_guidance
Agent drop down.
UPDATE
I attempt to use @danikhan632 fork - but no dice.
else:
# tokenizer = transformers.LlamaTokenizer.from_pretrained(MODEL_PATH, use_fast=True, device_map="auto")
# model = transformers.LlamaForCausalLM.from_pretrained(MODEL_PATH, torch_dtype=torch.bfloat16, device_map="auto")
# Because LLama already has role start and end, we don't need to add role_start=role_start, role_end=role_end)
# guidance.llm = guidance.llms.transformers.LLaMA(model=model, tokenizer=tokenizer)
guidance.llm = guidance.llms.TGWUI("http://127.0.0.1:5000")
N.B. this didn't work.
if model_string == "TheBloke_Llama-2-13B-chat-GGML":
MODEL_PATH = '/media/2TB/text-generation-webui/models/TheBloke_Llama-2-13B-chat-GGML/llama-2-13b-chat.ggmlv3.q5_K_S.bin'
CHECKPOINT_PATH = None
attempting now to use solution as above. @smellslikeml - is there any video workflow guidance / canned prompts you crafted that would make sense and you can share?
else:
# tokenizer = transformers.LlamaTokenizer.from_pretrained(MODEL_PATH, use_fast=True, device_map="auto")
# model = transformers.LlamaForCausalLM.from_pretrained(MODEL_PATH, torch_dtype=torch.bfloat16, device_map="auto")
# Because LLama already has role start and end, we don't need to add role_start=role_start, role_end=role_end)
# guidance.llm = guidance.llms.transformers.LLaMA(model=model, tokenizer=tokenizer)
# guidance.llm = guidance.llms.TGWUI("http://127.0.0.1:5000")
guidance.llm = guidance.llms.transformers.LLaMA("remyxai/ffmperative-7b", device_map="auto")
UPDATE 3
using this
guidance.llm = guidance.llms.transformers.LLaMA("remyxai/ffmperative-7b", device_map="auto")
getting this error
raise NotImplementedError("In order to use chat role tags you need to use a chat-specific subclass of Transformers for your LLM from guidance.transformers.*!")
NotImplementedError: In order to use chat role tags you need to use a chat-specific subclass of Transformers for your LLM from guidance.transformers.*!
Error in program: In order to use chat role tags you need to use a chat-specific subclass of Transformers for your LLM from guidance.transformers.*!
pip list - https://gist.github.com/johndpope/2bc86b8b976a81e47f655267c4daf537
Let me try updating, unsure if repo is still active
Any update on using LLama2
chat models with guidance
??
expect something maybe friday
it looks like @iiis-ai has a working example with llama 1 / maybe working with llama2?
guidance.llm = guidance.llms.transformers.LLaMA(args.model, device_map="auto", token_healing=True, torch_dtype=torch.bfloat16) https://github.com/yifanzhang-pro/cumulative-reasoning-anonymous/blob/07bcc6b21aedbee7c82f44b52aa3c0fc123e4d03/AutoTNLI/autotnli-cr.py#L27
LLama2 works fine in the new release, both with HF transformers and with llama.cpp. Please check this out
is this possible? or have to redo training? or ?