guidance-ai / guidance

A guidance language for controlling large language models.
MIT License
18.81k stars 1.04k forks source link

Guidance transformer "AttributeError: 'str' object has no attribute 'get_added_vocab'" #860

Open liboliba opened 4 months ago

liboliba commented 4 months ago

The bug A clear and concise description of what the bug is. llama2 = models.Transformers(model=".../Llama-2-7b-chat-hf/my-llama-2",tokenizer=".../Llama-2-7b-chat-hf/llama-2-tokenizer") Gives the error "AttributeError: 'str' object has no attribute 'get_added_vocab'"

Loading checkpoint shards: 100%|▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒| 6/6 [00:25<00:00, 4.22s/it] Traceback (most recent call last): File "", line 1, in File "/home/ll1d19/.conda/envs/videollama/lib/python3.9/site-packages/guidance/models/transformers/_transformers.py", line 283, in init TransformersEngine(model, tokenizer, compute_log_probs, chat_template=chat_template, **kwargs), echo=echo File "/home/ll1d19/.conda/envs/videollama/lib/python3.9/site-packages/guidance/models/transformers/_transformers.py", line 168, in init TransformersTokenizer(model, tokenizer, chat_template), compute_log_probs=compute_log_probs File "/home/ll1d19/.conda/envs/videollama/lib/python3.9/site-packages/guidance/models/transformers/_transformers.py", line 19, in init id: token for token, id in tokenizer.get_added_vocab().items() AttributeError: 'str' object has no attribute 'get_added_vocab'

To Reproduce Give a full working code snippet that can be pasted into a notebook cell or python file. Make sure to include the LLM load step so we know which model you are using.

# put your code snippet here
from guidance import models, gen, select
llama2 = models.Transformers(model=".../Llama-2-7b-chat-hf/my-llama-2",tokenizer=".../Llama-2-7b-chat-hf/llama-2-tokenizer")

System info (please complete the following information):

Harsha-Nori commented 4 months ago

Hi @liboliba, thanks for reporting this! Guidance currently requires that you pass in an instantiated tokenizer here, instead of just a string representing the tokenizer you want . You can get around this for now by actually creating the Transformers tokenizer. In the future, perhaps we should update this arg to take in more flexible types, just like the model arg does

liboliba commented 4 months ago

Hi @liboliba, thanks for reporting this! Guidance currently requires that you pass in an instantiated tokenizer here, instead of just a string representing the tokenizer you want . You can get around this for now by actually creating the Transformers tokenizer. In the future, perhaps we should update this arg to take in more flexible types, just like the model arg does

Should I also do this with the transformer model as well?

model = AutoModelForCausalLM.from_pretrained(".../Llama-2-7b-chat-hf/my-llama-2")
tokenizer = AutoTokenizer.from_pretrained(".../Llama-2-7b-chat-hf/llama-2-tokenizer")
from guidance import models, gen, select
llama2 = models.Transformers(model=model,tokenizer=tokenizer)

Thank you for the fast response !

liboliba commented 4 months ago

Hi @liboliba, thanks for reporting this! Guidance currently requires that you pass in an instantiated tokenizer here, instead of just a string representing the tokenizer you want . You can get around this for now by actually creating the Transformers tokenizer. In the future, perhaps we should update this arg to take in more flexible types, just like the model arg does

Should I also do this with the transformer model as well?

model = AutoModelForCausalLM.from_pretrained(".../Llama-2-7b-chat-hf/my-llama-2")
tokenizer = AutoTokenizer.from_pretrained(".../Llama-2-7b-chat-hf/llama-2-tokenizer")
from guidance import models, gen, select
llama2 = models.Transformers(model=model,tokenizer=tokenizer)

Thank you for the fast response !

I get NameError: name 'model' is not defined, and if change model to the path, I then get NameError: name 'tokenizer' is not defined.

Would you please kindly provide a working example in this case for both the model and tokenizer under transformer? Thank you!

liboliba commented 4 months ago

In case anyone new might be interested, the code below should work.

#import 
tokenizer = AutoTokenizer.from_pretrained(".../Llama-2-7b-chat-hf/llama-2-tokenizer")
from guidance import models, gen, select
llama2 = models.Transformers(".../Llama-2-7b-chat-hf/my-llama-2",tokenizer=tokenizer)