Open Yarflam opened 4 months ago
Hello!
How to reproduce:
from transformers import AutoTokenizer tokenizer = AutoTokenizer.from_pretrained('mistralai/Mistral-7B-Instruct-v0.2') tokenizer.add_bos_token = False tokenizer.add_eos_token = False ids = [ 12866, 601 ] # "▁domestic" + "ated" decode = tokenizer.decode(ids) encode = tokenizer.encode(decode) print(encode) # output -> [2853, 374, 6899] # "▁dom" + "est" + "icated"
I don't know what's the best thing to do and if this case has an impact on the calculation. It's just a feedback - but I'm sure it's possible to find another cases.
Hello!
How to reproduce:
I don't know what's the best thing to do and if this case has an impact on the calculation. It's just a feedback - but I'm sure it's possible to find another cases.