openai / tiktoken

tiktoken is a fast BPE tokeniser for use with OpenAI's models.
MIT License
12.48k stars 856 forks source link

Is there a new tokenizer for o1 models? #337

Closed jiadingfang closed 1 month ago

jiadingfang commented 2 months ago

Hi openai devs,

Congrats on releasing o1 models, and I'm excited to test them out. Since there seems to be new "reasoning" tokens in generation, I wonder if there is a new version for tiktoken for o1 models?

Thanks in advance!

kotikkonstantin commented 2 months ago

@jiadingfang I suspect it's same as gpt-4o has

Rameenh commented 2 months ago

I am curious about this as well.

llv22 commented 2 months ago

I'm also curious about it

mrsbeep commented 2 months ago

I'm also very curious about this issue. It seems to be basically the same as the gpt-4o model, but it seems to me that it includes some other tokens.

Ki-Seki commented 2 months ago

Curious +1

senghorn commented 2 months ago

Need this support!

archasek commented 2 months ago

Why tiktoken.encodingForModel("gpt-4o-mini") still doesn't work? Error: No tokenizer found for model gpt-4o-mini

enricoros commented 1 month ago

I see different input token counts between o1-preview and gpt-4o. Probably a different tokenizer. Update: I take it back, as probably behind the scenes a different system prompt is set, and counts as input.

hauntsaninja commented 1 month ago

This is fixed in tiktoken 0.8. Yes, from user's perspective the o1 tokeniser works the same way as 4o tokeniser