ollama-python outputs sometimes half tokens. original ollama works fine in my system

AbdirayimovS commented 1 month ago

Hi there,

I was playing around with ollama-python in jupyter notebook. I have created small HTML form to show dynamic results of the model. In the result, I see some words consists of two tokens or even more. In the image below:

The word (CE O) -> CEO;
(Th ink) -> Think;
(D iffer ently) -> Differently

This kind of behaviour does noto appear in the ollama (the same model) in terminal Screen shot of Ollama in terminal

Looking forward to hearing from you!

mxyng commented 1 month ago

Here are the raw decoded tokens for a similar prompt.

$ python tmp.py
{'role': 'assistant', 'content': '\n'}
{'role': 'assistant', 'content': 'The'}
{'role': 'assistant', 'content': ' quote'}
{'role': 'assistant', 'content': ' "'}
{'role': 'assistant', 'content': 'Th'}
{'role': 'assistant', 'content': 'ink'}
{'role': 'assistant', 'content': ' D'}
{'role': 'assistant', 'content': 'iffer'}
{'role': 'assistant', 'content': 'ently'}
{'role': 'assistant', 'content': '"'}
{'role': 'assistant', 'content': ' is'}
{'role': 'assistant', 'content': ' attributed'}
{'role': 'assistant', 'content': ' to'}
{'role': 'assistant', 'content': ' Steve'}
{'role': 'assistant', 'content': ' Job'}
{'role': 'assistant', 'content': 's'}
{'role': 'assistant', 'content': ','}
{'role': 'assistant', 'content': ' the'}
{'role': 'assistant', 'content': ' co'}
{'role': 'assistant', 'content': '-'}
{'role': 'assistant', 'content': 'found'}
{'role': 'assistant', 'content': 'er'}
{'role': 'assistant', 'content': ' and'}
{'role': 'assistant', 'content': ' former'}
{'role': 'assistant', 'content': ' CE'}
{'role': 'assistant', 'content': 'O'}
{'role': 'assistant', 'content': ' of'}
{'role': 'assistant', 'content': ' Apple'}
{'role': 'assistant', 'content': '.'}
{'role': 'assistant', 'content': ' He'}
{'role': 'assistant', 'content': ' often'}
{'role': 'assistant', 'content': ' used'}
{'role': 'assistant', 'content': ' this'}
{'role': 'assistant', 'content': ' phrase'}
{'role': 'assistant', 'content': ' in'}
{'role': 'assistant', 'content': ' his'}
{'role': 'assistant', 'content': ' present'}
{'role': 'assistant', 'content': 'ations'}
{'role': 'assistant', 'content': ' and'}
{'role': 'assistant', 'content': ' speech'}
{'role': 'assistant', 'content': 'es'}
{'role': 'assistant', 'content': ' to'}
{'role': 'assistant', 'content': ' encou'}
{'role': 'assistant', 'content': 'rage'}
{'role': 'assistant', 'content': ' people'}
{'role': 'assistant', 'content': ' to'}
{'role': 'assistant', 'content': ' think'}
{'role': 'assistant', 'content': ' cre'}
{'role': 'assistant', 'content': 'atively'}
{'role': 'assistant', 'content': ' and'}
{'role': 'assistant', 'content': ' outside'}
{'role': 'assistant', 'content': ' the'}
{'role': 'assistant', 'content': ' box'}
{'role': 'assistant', 'content': '.'}
{'role': 'assistant', 'content': ''}

As you can see, it does split some words into multiple tokens. This is common for many tokenizers since longer words are usually composed of smaller parts.

Can you confirm this is not rendering or postprocessing issue? For llm outputs, you're intended to join the output without additional spaces since the tokens themselves capture spacing.

Here's the same response but joined into a reader friendly string

$ python tmp.py

The quote "Think Differently" is attributed to Steve Jobs, the co-founder and former CEO of Apple.

Here's tmp.py

import ollama

for response in ollama.chat(
  model='llama2',
  messages=[{'role': 'user', 'content': 'Who said: "Think differently?"'}],
  stream=True,
):
  print(response['message']['content'], end='', flush=True)

AbdirayimovS commented 1 month ago

I thought tokens are created without any consideration of whitespace. My fault. I fixed it!

ollama / ollama-python

ollama-python outputs sometimes half tokens. original ollama works fine in my system #143