Select + transformer doesn't work

EgorBu commented 10 months ago

The bug Loaded transformer with guidance fails with error when using select

To Reproduce Give a full working code snippet that can be pasted into a notebook cell or python file. Make sure to include the LLM load step so we know which model you are using.

from guidance import models, gen, select
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig

model = AutoModelForCausalLM.from_pretrained("microsoft/phi-1_5",
    load_in_4bit=True, device_map="auto", trust_remote_code=True
    )
tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-1_5",
    trust_remote_code=True,
    torch_dtype=torch.bfloat16)

gmodel = models.Transformers(model=model, tokenizer=tokenizer)

gmodel + "what is the results of sum 4 and 2?" + select(['6', '8'], name='answer') + gen(stop='.')

error log:

what is the results of sum 4 and 2?
/home/egor/workdir/github/aiforcode/ai4code-experiments/venv/lib/python3.10/site-packages/bitsandbytes/nn/modules.py:226: UserWarning: Input type into Linear4bit is torch.float16, but bnb_4bit_compute_type=torch.float32 (default). This will lead to slow inference or training speed.
  warnings.warn(f'Input type into Linear4bit is torch.float16, but bnb_4bit_compute_type=torch.float32 (default). This will lead to slow inference or training speed.')
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Cell In[1], line 14
      8 tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-1_5",
      9     trust_remote_code=True,
     10     torch_dtype=torch.bfloat16)
     12 gmodel = models.Transformers(model=model, tokenizer=tokenizer)
---> 14 gmodel + "what is the results of sum 4 and 2?" + select(['6', '8'], name='answer') + gen(stop='.')

File ~/workdir/github/aiforcode/ai4code-experiments/venv/lib/python3.10/site-packages/guidance/models/_model.py:204, in Model.__add__(self, value)
    202 # run stateless functions (grammar nodes)
    203 elif isinstance(value, StatelessFunction):
--> 204     return lm.run_stateless(value)
    206 # run stateful functions
    207 else:
    208     return value(lm)

File ~/workdir/github/aiforcode/ai4code-experiments/venv/lib/python3.10/site-packages/guidance/models/_model.py:303, in Model.run_stateless(lm, stateless_function, max_tokens, temperature, top_p, n)
    301 delayed_bytes = b""
    302 # last_is_generated = False
--> 303 for new_bytes, is_generated, new_bytes_log_prob, capture_groups, capture_group_log_probs, new_token_count in gen_obj:
    304     # convert the bytes to a string (delaying if we don't yet have a valid unicode string)
    305     lm._token_count += new_token_count
    306     new_bytes = delayed_bytes + new_bytes

File ~/workdir/github/aiforcode/ai4code-experiments/venv/lib/python3.10/site-packages/guidance/models/_local.py:254, in Local.__call__(self, grammar, max_tokens, n, top_p, temperature, ensure_bos_token, log_probs)
    252 # loop over the tokens looking for a valid one
    253 for i,sampled_token_ind in enumerate(sampling_order):
--> 254     sampled_token = self.tokens[sampled_token_ind]
    256     # make sure the parse is backed up to the position we want to start checking from TODO: make this account for shared prefixes with the last token
    257     parser.pos = forced_pos

IndexError: list index out of range

System info (please complete the following information):

OS: Ubuntu
Guidance Version (guidance.__version__): '0.1.1'

slundberg commented 10 months ago

Thanks! Unfortunately, I can't seem to reproduce this:

If you have other scenarios where is happens let us know!

EgorBu commented 10 months ago

Thanks a lot! Can you suggest how to debug this problem, @slundberg? It looks like super basic and useful feature, I'm surprised that I have some issues (and it used to work in previous versions before major updates)

EgorBu commented 10 months ago

I checked huggingface page for phi-1.5 model - and found related issue Some comments from there:

Size of tokenizer vocab is 50257, while size of vocab in config is 51200.
...
Hi there,

I understand that it works fine as long as tokenizer.vocab_size <= model.layers[0].wte.weight.shape[0], but it seems that the number 50257 is actually incorrect.
When you count unique indices in the vocabulary, including added_tokens, the correct number appears to be 50295 instead.
I am not knowledgeable about how this attribute is configured when initializing the tokenizer, but this issue may need to be fixed because sometimes we want to access the value through this attribute (tokenizer.vocab_size).
...
This is the expected behavior of transformers. Please check this issue: https://github.com/huggingface/transformers/issues/12632
...
I'm afraid but the link you suggested doesn't seem very relevant to the issue.

Of course, we can get the actual vocabulary size with len(tokenizer.get_vocab()) or something.
However, the added_tokens are incorporated by default without users specifying them, as defined in [tokenizer.json](https://huggingface.co/microsoft/phi-1_5/blob/main/tokenizer.json).
Given that the argument is supposed to be passed by users, I would not consider this as an "expected behavior" of the library.
The current implementation can cause errors for future users relying on the (presumably widely used)vocab_size attribute, so it would be better off corrected, maybe by moving the additional tokens into the default ones.

Thanks for your response.

and it looks like that guidance is using tkz.vocab_size here instead of len(tkz) - that is causing the IndexError

(TBH I'm surprised that it's not reproduced - somehow we have different distribution of probabilities for tokens if I'm correctly understand what's happening)

EgorBu commented 10 months ago

https://github.com/guidance-ai/guidance/pull/460 - created PR with fix

EgorBu commented 10 months ago

Fix was merged.

guidance-ai / guidance

Select + transformer doesn't work #453