guidance-ai / guidance

A guidance language for controlling large language models.
MIT License
19.18k stars 1.05k forks source link

guidance.models.Transformers() seems to support very few models #782

Open crclark opened 7 months ago

crclark commented 7 months ago

The bug

It seems that many models loaded with models.Transformers() error out with:

AssertionError: The passed tokenizer does have a byte_decoder property and using a standard gpt2 byte_decoder fails!

(side note: I think this error intends to say "does not have a byte_decoder property")

Is this expected behavior? If so, it needs more docs.

To Reproduce

from guidance import models, gen, select
import torch

llama3 = models.Transformers('meta-llama/Meta-Llama-3-8B', torch_dtype= torch.float16, device_map = 'cuda')

The model id can be replaced with many others and I get the same error -- mistral, llama 2, etc.

System info (please complete the following information):

am-bean commented 6 months ago

Extremely hacky, but I managed to work around this by passing my own byte_decoder as part of the tokenizer.

e.g. for Llama 3, you can brute force reverse engineer the byte encodings like this. For completeness, you then need to add in the encodings for values which are not valid in Unicode, but it can be observed that Llama assigned the encodings alphabetically. Then just assign the byte_decoder to the tokenizer and you are good to go:

from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained(
    "meta-llama/Meta-Llama-3-8B-Instruct",
    padding_side="left",
    trust_remote_code=True,  # token=hf_token
)

byte_decoder = {}
alphabet = pre_tokenizers.ByteLevel(False, False).alphabet()
known_vals = set([])

for j in range(256):
    for k in range(256):
        for l in range(256):
            if len(byte_decoder.keys()) < 256:
                b = b""
                vals = [j,k,l]
                if not set(vals).issubset(known_vals):
                    for d in range(3):
                        b = b + int.to_bytes(vals[d])
                    try:
                        c = b.decode()
                        t = pre_tokenizers.ByteLevel(False,False).pre_tokenize_str(c)[0][0]
                        for m in range(3):
                            if t[m] not in byte_decoder.keys():
                                byte_decoder[t[m]] = vals[m]
                                known_vals.add(vals[m])
                    except UnicodeDecodeError:
                        pass

print(len(byte_decoder))

byte_decoder['À'] = 192
byte_decoder['Á'] = 193

byte_decoder['ð'] = 240
byte_decoder['ñ'] = 241
byte_decoder['ò'] = 242
byte_decoder['ó'] = 243
byte_decoder['ô'] = 244
byte_decoder['õ'] = 245
byte_decoder['ö'] = 246
byte_decoder['÷'] = 247
byte_decoder['ø'] = 248
byte_decoder['ù'] = 249
byte_decoder['ú'] = 250
byte_decoder['û'] = 251
byte_decoder['ü'] = 252
byte_decoder['ý'] = 253
byte_decoder['þ'] = 254
byte_decoder['ÿ'] = 255

from guidance import models
tokenizer.byte_decoder = byte_decoder
lm = models.Transformers("meta-llama/Meta-Llama-3-8B-Instruct", tokenizer=tokenizer)
manish-marwah commented 6 months ago

I am unable to use guidance due to this bug. @Harsha-Nori do you know if a fix forthcoming? Thanks.

Tangent-90C commented 6 months ago

Extremely hacky, but I managed to work around this by passing my own byte_decoder as part of the tokenizer.

e.g. for Llama 3, you can brute force reverse engineer the byte encodings like this. For completeness, you then need to add in the encodings for values which are not valid in Unicode, but it can be observed that Llama assigned the encodings alphabetically. Then just assign the byte_decoder to the tokenizer and you are good to go:

from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained(
    "meta-llama/Meta-Llama-3-8B-Instruct",
    padding_side="left",
    trust_remote_code=True,  # token=hf_token
)

byte_decoder = {}
alphabet = pre_tokenizers.ByteLevel(False, False).alphabet()
known_vals = set([])

for j in range(256):
    for k in range(256):
        for l in range(256):
            if len(byte_decoder.keys()) < 256:
                b = b""
                vals = [j,k,l]
                if not set(vals).issubset(known_vals):
                    for d in range(3):
                        b = b + int.to_bytes(vals[d])
                    try:
                        c = b.decode()
                        t = pre_tokenizers.ByteLevel(False,False).pre_tokenize_str(c)[0][0]
                        for m in range(3):
                            if t[m] not in byte_decoder.keys():
                                byte_decoder[t[m]] = vals[m]
                                known_vals.add(vals[m])
                    except UnicodeDecodeError:
                        pass

print(len(byte_decoder))

byte_decoder['À'] = 192
byte_decoder['Á'] = 193

byte_decoder['ð'] = 240
byte_decoder['ñ'] = 241
byte_decoder['ò'] = 242
byte_decoder['ó'] = 243
byte_decoder['ô'] = 244
byte_decoder['õ'] = 245
byte_decoder['ö'] = 246
byte_decoder['÷'] = 247
byte_decoder['ø'] = 248
byte_decoder['ù'] = 249
byte_decoder['ú'] = 250
byte_decoder['û'] = 251
byte_decoder['ü'] = 252
byte_decoder['ý'] = 253
byte_decoder['þ'] = 254
byte_decoder['ÿ'] = 255

from guidance import models
tokenizer.byte_decoder = byte_decoder
lm = models.Transformers("meta-llama/Meta-Llama-3-8B-Instruct", tokenizer=tokenizer)

what's the pre_tokenizers? How to import it?

am-bean commented 6 months ago

Extremely hacky, but I managed to work around this by passing my own byte_decoder as part of the tokenizer.

e.g. for Llama 3, you can brute force reverse engineer the byte encodings like this. For completeness, you then need to add in the encodings for values which are not valid in Unicode, but it can be observed that Llama assigned the encodings alphabetically. Then just assign the byte_decoder to the tokenizer and you are good to go:


from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained(

    "meta-llama/Meta-Llama-3-8B-Instruct",

    padding_side="left",

    trust_remote_code=True,  # token=hf_token

)

byte_decoder = {}

alphabet = pre_tokenizers.ByteLevel(False, False).alphabet()

known_vals = set([])

for j in range(256):

    for k in range(256):

        for l in range(256):

            if len(byte_decoder.keys()) < 256:

                b = b""

                vals = [j,k,l]

                if not set(vals).issubset(known_vals):

                    for d in range(3):

                        b = b + int.to_bytes(vals[d])

                    try:

                        c = b.decode()

                        t = pre_tokenizers.ByteLevel(False,False).pre_tokenize_str(c)[0][0]

                        for m in range(3):

                            if t[m] not in byte_decoder.keys():

                                byte_decoder[t[m]] = vals[m]

                                known_vals.add(vals[m])

                    except UnicodeDecodeError:

                        pass

print(len(byte_decoder))

byte_decoder['À'] = 192

byte_decoder['Á'] = 193

byte_decoder['ð'] = 240

byte_decoder['ñ'] = 241

byte_decoder['ò'] = 242

byte_decoder['ó'] = 243

byte_decoder['ô'] = 244

byte_decoder['õ'] = 245

byte_decoder['ö'] = 246

byte_decoder['÷'] = 247

byte_decoder['ø'] = 248

byte_decoder['ù'] = 249

byte_decoder['ú'] = 250

byte_decoder['û'] = 251

byte_decoder['ü'] = 252

byte_decoder['ý'] = 253

byte_decoder['þ'] = 254

byte_decoder['ÿ'] = 255

from guidance import models

tokenizer.byte_decoder = byte_decoder

lm = models.Transformers("meta-llama/Meta-Llama-3-8B-Instruct", tokenizer=tokenizer)

what's the pre_tokenizers? How to import it?

Ah sorry, it's from the Hugging Face tokenizers library.

from tokenizers import pre_tokenizers

Tangent-90C commented 6 months ago

Extremely hacky, but I managed to work around this by passing my own byte_decoder as part of the tokenizer.

e.g. for Llama 3, you can brute force reverse engineer the byte encodings like this. For completeness, you then need to add in the encodings for values which are not valid in Unicode, but it can be observed that Llama assigned the encodings alphabetically. Then just assign the byte_decoder to the tokenizer and you are good to go:

from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained(
    "meta-llama/Meta-Llama-3-8B-Instruct",
    padding_side="left",
    trust_remote_code=True,  # token=hf_token
)

byte_decoder = {}
alphabet = pre_tokenizers.ByteLevel(False, False).alphabet()
known_vals = set([])

for j in range(256):
    for k in range(256):
        for l in range(256):
            if len(byte_decoder.keys()) < 256:
                b = b""
                vals = [j,k,l]
                if not set(vals).issubset(known_vals):
                    for d in range(3):
                        b = b + int.to_bytes(vals[d])
                    try:
                        c = b.decode()
                        t = pre_tokenizers.ByteLevel(False,False).pre_tokenize_str(c)[0][0]
                        for m in range(3):
                            if t[m] not in byte_decoder.keys():
                                byte_decoder[t[m]] = vals[m]
                                known_vals.add(vals[m])
                    except UnicodeDecodeError:
                        pass

print(len(byte_decoder))

byte_decoder['À'] = 192
byte_decoder['Á'] = 193

byte_decoder['ð'] = 240
byte_decoder['ñ'] = 241
byte_decoder['ò'] = 242
byte_decoder['ó'] = 243
byte_decoder['ô'] = 244
byte_decoder['õ'] = 245
byte_decoder['ö'] = 246
byte_decoder['÷'] = 247
byte_decoder['ø'] = 248
byte_decoder['ù'] = 249
byte_decoder['ú'] = 250
byte_decoder['û'] = 251
byte_decoder['ü'] = 252
byte_decoder['ý'] = 253
byte_decoder['þ'] = 254
byte_decoder['ÿ'] = 255

from guidance import models
tokenizer.byte_decoder = byte_decoder
lm = models.Transformers("meta-llama/Meta-Llama-3-8B-Instruct", tokenizer=tokenizer)

Hello, I found a bug in your code. The code

b = b + int.to_bytes(vals[d])

should be change to

b = b + vals[d].to_bytes(1, 'little', signed=False)
am-bean commented 6 months ago

Thanks! I swapped it out and found no difference on my local setup, but setting specific arguments should be more robust across installations.

Harsha-Nori commented 6 months ago

Hey, @paulbkoch and I are looking into this, though we're balancing quite a few things on our plate at the moment. Would be really helpful to collect a list of models where this error occurs (beyond 'meta-llama/Meta-Llama-3-8B') if anyone else has run into this recently.

adeen-s commented 6 months ago

@Harsha-Nori I just encountered this issue with another model called core42/jais-13b. Hope this helps

am-bean commented 6 months ago

Gemma models also don't work. They directly encode the bytes as tokens (e.g. "ൠ" is tokenized as '<0xE0>' '<0xB5>' '<0xA0>'). This doesn't translate well into the byte_decoder paradigm as tested by guidance.

EDIT: I was using 0.1.13, but this is fixed in 0.1.14 by treating sentencepiece models separately. (Requires setting use_fast=False on the tokenizer)

petergfennell commented 6 months ago

Is there any up to date list of models that are supported? I just tried llama 3 8B instruct and mistral 7b instruct and neither works. I saw from the README that 'gpt2' was used with transformers and so I can use that, but would prefer to use newer models

am-bean commented 6 months ago

Is there any up to date list of models that are supported? I just tried llama 3 8B instruct and mistral 7b instruct and neither works. I saw from the README that 'gpt2' was used with transformers and so I can use that, but would prefer to use newer models.

I've had the following working in the last 24 hours:

For many of them you need to load the tokenizer with use_fast=False in order to have the sentencepiece model available to guidance. (Requires on 0.1.14)

Harsha-Nori commented 6 months ago

Hey @am-bean, thanks for chasing this down! We should be loading with use_fast=False eventually -- do you know if this is happening automatically, or do you have to manually instantiate and pass in the tokenizer yourself? If so, I think there's a bug I'd like to chase down.

Glad to hear all these models work for you!

manish-marwah commented 6 months ago

I've had the following working in the last 24 hours:

  • Llama 3 8/70B instruct
  • Mixtral 8x7B
  • Gemma 7B
  • Llama 2 70B

For many of them you need to load the tokenizer with use_fast=False in order to have the sentencepiece model available to guidance. (Requires on 0.1.14)

I tried updating to 0.1.14, instantiated a tokenizer with use_fast=False, but the model still fails with the same error

`tokenizer = AutoTokenizer.from_pretrained( "meta-llama/Meta-Llama-3-8B-Instruct", use_fast=False )

llama3 = models.Transformers("meta-llama/Meta-Llama-3-8B-Instruct", tokenizer=tokenizer) ` still gives AssertionError: The passed tokenizer does not have a byte_decoder property and using a standard gpt2 byte_decoder fails!

am-bean commented 6 months ago

Hey @am-bean, thanks for chasing this down! We should be loading with use_fast=False eventually -- do you know if this is happening automatically, or do you have to manually instantiate and pass in the tokenizer yourself? If so, I think there's a bug I'd like to chase down.

Glad to hear all these models work for you!

I needed to instantiate the tokenizers myself and pass them in at least for Mixtral. I didn't test the others since it was easier just to have consistent code for all the models.

I arrived at loading them separately as a solution because the error was indicating that I had ended up trying to load in gpt2 as the tokenizer, which meant that hasattr(tokenizer, "sp_model") was False. Unfortunately, at a quick scan I agree with your expected behaviour and don't see the bug.

am-bean commented 6 months ago

I've had the following working in the last 24 hours:

  • Llama 3 8/70B instruct
  • Mixtral 8x7B
  • Gemma 7B
  • Llama 2 70B

For many of them you need to load the tokenizer with use_fast=False in order to have the sentencepiece model available to guidance. (Requires on 0.1.14)

I tried updating to 0.1.14, instantiated a tokenizer with use_fast=False, but the model still fails with the same error

`tokenizer = AutoTokenizer.from_pretrained( "meta-llama/Meta-Llama-3-8B-Instruct", use_fast=False )

llama3 = models.Transformers("meta-llama/Meta-Llama-3-8B-Instruct", tokenizer=tokenizer) ` still gives AssertionError: The passed tokenizer does not have a byte_decoder property and using a standard gpt2 byte_decoder fails!

Llama 3 uses a different type of tokenizer than the rest in that list. You'll need to try follow the instructions I posted way above about setting a custom byte_decoder.

manish-marwah commented 6 months ago

I've had the following working in the last 24 hours:

  • Llama 3 8/70B instruct
  • Mixtral 8x7B
  • Gemma 7B
  • Llama 2 70B

For many of them you need to load the tokenizer with use_fast=False in order to have the sentencepiece model available to guidance. (Requires on 0.1.14)

I tried updating to 0.1.14, instantiated a tokenizer with use_fast=False, but the model still fails with the same error tokenizer = AutoTokenizer.from_pretrained( "meta-llama/Meta-Llama-3-8B-Instruct", use_fast=False ) llama3 = models.Transformers("meta-llama/Meta-Llama-3-8B-Instruct", tokenizer=tokenizer) still gives AssertionError: The passed tokenizer does not have a byte_decoder property and using a standard gpt2 byte_decoder fails!

Llama 3 uses a different type of tokenizer than the rest in that list. You'll need to try follow the instructions I posted way above about setting a custom byte_decoder.

Thanks @am-bean. Yes your fix works. I was hoping for a less (in your words) "hacky" solution.

LuoKaiGSW commented 6 months ago

Hey @am-bean, thanks for chasing this down! We should be loading with use_fast=False eventually -- do you know if this is happening automatically, or do you have to manually instantiate and pass in the tokenizer yourself? If so, I think there's a bug I'd like to chase down.

Glad to hear all these models work for you!

hi, @Harsha-Nori, I would like to ask, my current model is using BloomTokenizer, which does not have the attributes of byte_decoder and sp_model. Therefore, according to the code, it uses the byte_decoder from gpt2. I would like to know, does this affect the final use of my model? Additionally, I would also like to ask, what roles do the byte_decoder and sp_model play for a tokenizer? Thank you.

LuoKaiGSW commented 6 months ago

Extremely hacky, but I managed to work around this by passing my own byte_decoder as part of the tokenizer.

e.g. for Llama 3, you can brute force reverse engineer the byte encodings like this. For completeness, you then need to add in the encodings for values which are not valid in Unicode, but it can be observed that Llama assigned the encodings alphabetically. Then just assign the byte_decoder to the tokenizer and you are good to go:

from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained(
    "meta-llama/Meta-Llama-3-8B-Instruct",
    padding_side="left",
    trust_remote_code=True,  # token=hf_token
)

byte_decoder = {}
alphabet = pre_tokenizers.ByteLevel(False, False).alphabet()
known_vals = set([])

for j in range(256):
    for k in range(256):
        for l in range(256):
            if len(byte_decoder.keys()) < 256:
                b = b""
                vals = [j,k,l]
                if not set(vals).issubset(known_vals):
                    for d in range(3):
                        b = b + int.to_bytes(vals[d])
                    try:
                        c = b.decode()
                        t = pre_tokenizers.ByteLevel(False,False).pre_tokenize_str(c)[0][0]
                        for m in range(3):
                            if t[m] not in byte_decoder.keys():
                                byte_decoder[t[m]] = vals[m]
                                known_vals.add(vals[m])
                    except UnicodeDecodeError:
                        pass

print(len(byte_decoder))

byte_decoder['À'] = 192
byte_decoder['Á'] = 193

byte_decoder['ð'] = 240
byte_decoder['ñ'] = 241
byte_decoder['ò'] = 242
byte_decoder['ó'] = 243
byte_decoder['ô'] = 244
byte_decoder['õ'] = 245
byte_decoder['ö'] = 246
byte_decoder['÷'] = 247
byte_decoder['ø'] = 248
byte_decoder['ù'] = 249
byte_decoder['ú'] = 250
byte_decoder['û'] = 251
byte_decoder['ü'] = 252
byte_decoder['ý'] = 253
byte_decoder['þ'] = 254
byte_decoder['ÿ'] = 255

from guidance import models
tokenizer.byte_decoder = byte_decoder
lm = models.Transformers("meta-llama/Meta-Llama-3-8B-Instruct", tokenizer=tokenizer)

hey, @am-bean, I've encountered the same problem as you. It seems your implementation method is effective, but I don't fully get it, so I would like to ask for your guidance. I don't understand why the byte_decoder would differ for different tokenizers. My understanding of it is quite superficial. For instance, in gpt2, the byte_decoder is established by finding 256 visible bytes to serve as the basic vocabulary for byte-level, so that any utf-8 character can be split into these bytes' concatenation to facilitate subsequent merges. From my perspective, how the mapping relationship is established doesn't matter as long as such a relationship exists. So where did my understanding go wrong? If I'm using bloomtokenizer(use_fast = True), how should I implement a byte_decoder? Thank you in advance.

lashmore commented 5 months ago

Is anyone getting a KeyError: "_"?

def add_byte_decoder():
    byte_decoder = {}
    alphabet = pre_tokenizers.ByteLevel(False, False).alphabet()
    known_vals = set([])
    for j in range(256):
        for k in range(256):
            for l in range(256):
                if len(byte_decoder.keys()) < 256:
                    b = b""
                    vals = [j,k,l]
                    if not set(vals).issubset(known_vals):
                        for d in range(3):
                            b = b + int.to_bytes(vals[d])
                        try:
                            c = b.decode()
                            t = pre_tokenizers.ByteLevel(False,False).pre_tokenize_str(c)[0][0]
                            for m in range(3):
                                if t[m] not in byte_decoder.keys():
                                    byte_decoder[t[m]] = vals[m]
                                    known_vals.add(vals[m])
                        except UnicodeDecodeError:
                            pass
    byte_decoder['À'] = 192
    byte_decoder['Á'] = 193
    byte_decoder['ð'] = 240
    byte_decoder['ñ'] = 241
    byte_decoder['ò'] = 242
    byte_decoder['ó'] = 243
    byte_decoder['ô'] = 244
    byte_decoder['õ'] = 245
    byte_decoder['ö'] = 246
    byte_decoder['÷'] = 247
    byte_decoder['ø'] = 248
    byte_decoder['ù'] = 249
    byte_decoder['ú'] = 250
    byte_decoder['û'] = 251
    byte_decoder['ü'] = 252
    byte_decoder['ý'] = 253
    byte_decoder['þ'] = 254
    byte_decoder['ÿ'] = 255
    return byte_decoder

tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2", padding_side="left", trust_remote_code=True)
tokenizer.byte_decoder = add_byte_decoder()
model = AutoModelForCausalLM.from_pretrained(name, quantization_config=nf4_config, device_map="auto")
lm = guidance.models.Transformers(model=model, tokenizer=tokenizer)

Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:06<00:00,  2.25s/it]
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[33], line 4
      2 tokenizer.byte_decoder = add_byte_decoder()
      3 model = AutoModelForCausalLM.from_pretrained(name, quantization_config=nf4_config, device_map="auto")
----> 4 lm = guidance.models.Transformers(model=model, tokenizer=tokenizer)

File ~/miniconda3/envs/dafee/lib/python3.11/site-packages/guidance/models/transformers/_transformers.py:196, in Transformers.__init__(self, model, tokenizer, echo, compute_log_probs, **kwargs)
    193 def __init__(self, model=None, tokenizer=None, echo=True, compute_log_probs=False, **kwargs):
    194     '''Build a new Transformers model object that represents a model in a given state.'''
    195     super().__init__(
--> 196         TransformersEngine(model, tokenizer, compute_log_probs, **kwargs),
    197         echo=echo
    198     )

File ~/miniconda3/envs/dafee/lib/python3.11/site-packages/guidance/models/transformers/_transformers.py:101, in TransformersEngine.__init__(self, model, tokenizer, compute_log_probs, **kwargs)
     97 self._cached_logits = None
     98 self._cached_token_ids = []
    100 super().__init__(
--> 101     TransformersTokenizer(model, tokenizer),
    102     compute_log_probs=compute_log_probs
    103 )

File ~/miniconda3/envs/dafee/lib/python3.11/site-packages/guidance/models/transformers/_transformers.py:41, in TransformersTokenizer.__init__(self, model, tokenizer, ignore_bos_token)
     39 byte_tokens = []
     40 for i in range(len(tokenizer)):
---> 41     byte_coded = bytes([byte_decoder[c] for c in tokenizer.convert_ids_to_tokens(i)])
     42     byte_tokens.append(byte_coded)
     46 # the superclass does most of the work once we have the tokens

File ~/miniconda3/envs/dafee/lib/python3.11/site-packages/guidance/models/transformers/_transformers.py:41, in <listcomp>(.0)
     39 byte_tokens = []
     40 for i in range(len(tokenizer)):
---> 41     byte_coded = bytes([byte_decoder[c] for c in tokenizer.convert_ids_to_tokens(i)])
     42     byte_tokens.append(byte_coded)
     46 # the superclass does most of the work once we have the tokens

KeyError: '▁'
am-bean commented 5 months ago

Is anyone getting a KeyError: "_"?

def add_byte_decoder():
    byte_decoder = {}
    alphabet = pre_tokenizers.ByteLevel(False, False).alphabet()
    known_vals = set([])
    for j in range(256):
        for k in range(256):
            for l in range(256):
                if len(byte_decoder.keys()) < 256:
                    b = b""
                    vals = [j,k,l]
                    if not set(vals).issubset(known_vals):
                        for d in range(3):
                            b = b + int.to_bytes(vals[d])
                        try:
                            c = b.decode()
                            t = pre_tokenizers.ByteLevel(False,False).pre_tokenize_str(c)[0][0]
                            for m in range(3):
                                if t[m] not in byte_decoder.keys():
                                    byte_decoder[t[m]] = vals[m]
                                    known_vals.add(vals[m])
                        except UnicodeDecodeError:
                            pass
    byte_decoder['À'] = 192
    byte_decoder['Á'] = 193
    byte_decoder['ð'] = 240
    byte_decoder['ñ'] = 241
    byte_decoder['ò'] = 242
    byte_decoder['ó'] = 243
    byte_decoder['ô'] = 244
    byte_decoder['õ'] = 245
    byte_decoder['ö'] = 246
    byte_decoder['÷'] = 247
    byte_decoder['ø'] = 248
    byte_decoder['ù'] = 249
    byte_decoder['ú'] = 250
    byte_decoder['û'] = 251
    byte_decoder['ü'] = 252
    byte_decoder['ý'] = 253
    byte_decoder['þ'] = 254
    byte_decoder['ÿ'] = 255
    return byte_decoder

tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2", padding_side="left", trust_remote_code=True)
tokenizer.byte_decoder = add_byte_decoder()
model = AutoModelForCausalLM.from_pretrained(name, quantization_config=nf4_config, device_map="auto")
lm = guidance.models.Transformers(model=model, tokenizer=tokenizer)

Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:06<00:00,  2.25s/it]
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[33], line 4
      2 tokenizer.byte_decoder = add_byte_decoder()
      3 model = AutoModelForCausalLM.from_pretrained(name, quantization_config=nf4_config, device_map="auto")
----> 4 lm = guidance.models.Transformers(model=model, tokenizer=tokenizer)

File ~/miniconda3/envs/dafee/lib/python3.11/site-packages/guidance/models/transformers/_transformers.py:196, in Transformers.__init__(self, model, tokenizer, echo, compute_log_probs, **kwargs)
    193 def __init__(self, model=None, tokenizer=None, echo=True, compute_log_probs=False, **kwargs):
    194     '''Build a new Transformers model object that represents a model in a given state.'''
    195     super().__init__(
--> 196         TransformersEngine(model, tokenizer, compute_log_probs, **kwargs),
    197         echo=echo
    198     )

File ~/miniconda3/envs/dafee/lib/python3.11/site-packages/guidance/models/transformers/_transformers.py:101, in TransformersEngine.__init__(self, model, tokenizer, compute_log_probs, **kwargs)
     97 self._cached_logits = None
     98 self._cached_token_ids = []
    100 super().__init__(
--> 101     TransformersTokenizer(model, tokenizer),
    102     compute_log_probs=compute_log_probs
    103 )

File ~/miniconda3/envs/dafee/lib/python3.11/site-packages/guidance/models/transformers/_transformers.py:41, in TransformersTokenizer.__init__(self, model, tokenizer, ignore_bos_token)
     39 byte_tokens = []
     40 for i in range(len(tokenizer)):
---> 41     byte_coded = bytes([byte_decoder[c] for c in tokenizer.convert_ids_to_tokens(i)])
     42     byte_tokens.append(byte_coded)
     46 # the superclass does most of the work once we have the tokens

File ~/miniconda3/envs/dafee/lib/python3.11/site-packages/guidance/models/transformers/_transformers.py:41, in <listcomp>(.0)
     39 byte_tokens = []
     40 for i in range(len(tokenizer)):
---> 41     byte_coded = bytes([byte_decoder[c] for c in tokenizer.convert_ids_to_tokens(i)])
     42     byte_tokens.append(byte_coded)
     46 # the superclass does most of the work once we have the tokens

KeyError: '▁'

You'll need to adapt the code a bit to make it work for a tokenizer other than Llama 3. The hardcoding at the end is from me reverse engineering by hand and won't apply to other models.

(For what it's worth, the iteration over 256256256 is horribly inefficient, since unicode is sparse on the initial bits of most bytes, but I didn't bother to fix it. If you write something more robust you might also want to replace that.)

lashmore commented 4 months ago

@am-bean - I confirm your fix works for Llama3!

Any pointers on where to look in the Guidance repo to understand how to customize a similar hack that you curated for Llama3, for other LLMs, such as Mistral / Starling / et al? I'm not sure how you put the pieces together / I haven't crawled the repo myself for clues. Thanks for any insights!

am-bean commented 4 months ago

@lashmore - Most of what I needed to work this out is actually from Hugging Face and not Guidance itself. The thing to look for is what tokenisers are being used and how they encode unicode values. The tokenizers package is the most relevant.

Seeing as this issue keeps attracting attention, maybe I can find time to write a better version of the workaround and add a PR...

lashmore commented 4 months ago

@am-bean please let us know if you do end up doing that! I'm looking at the tokenizers package right now. I see Mistral, Starling and LLama3 all use SentencePieceBPETokenizer tokenization but I'll need to stare at it much longer to understand how to construct the add_byte_decoder(). Thanks for the pointer, and do let us know if you go ahead and work a PR!

arbitropy commented 2 months ago

Currently I have tried 4 models that fit in my gpu:

  1. mistral nemo: works perfectly
  2. gemma 2 27b: the code runs until inference using use_fast = True but then the output is garbage.
  3. command r: doesn't load even using use_fast = False
  4. llama3.1 8b: works but has other problems like generation not stopping.
Harsha-Nori commented 2 months ago

Hey @arbitropy (and everyone reading), really appreciate you testing these out for us. We've been updating our support and infrastructure in our release candidate, and tried to improve the tokenizer loading logic there. Would you mind trying with the pre-release guidance (pip install guidance --pre) and seeing if that does any better?

If they still don't work on the release candidate, nailing down support for gemma/llama/command r is definitely something we need to investigate

@riedgar-ms @nking-1 @hudson-ai FYI