Generating chat responses with images has a rather nasty side effect in that, if a message has images, they will be overwritten by their Base64 encoding.
[{'role': 'user', 'content': 'What is this?', 'images': ['ollama.png']}]
[{'role': 'user', 'content': 'What is this?', 'images': ['iVBORw0KGgoAAAANSUhEUgAAALUAAAEACAYAAAD1IzfbAAAACXBIWXMAAAsTAAALEwEAmpwYAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAABzUSURBVHgB7Z0JtGRVdYZ/bTUMDiiKMnYLhKFVJCBOKDxdIM4oEiUaYiNxJRpFkhVwjLTLKVFwiONSGWIaUIFAUNAElUYDR (it goes on for way longer)
If a user passed an image path, it is undesirable to have this path overwritten by extremely verbose Base64-text.
Obviously this can be avoided on the user's end by simply making a deep copy themselves before passing messages. But because this is a rather obscure issue that is never mentioned in the documentation, I recommend simply removing the possiblility of this side-effect.
There are multiple possible solutions to this issue, but because the performance cost of making a deep copy is insignificant in comparison to the performance cost of LLM inference, I went with the easiest fix: simply making a deep copy of messages every time ollama.chat is called, but I am looking forward to hearing the opinions of the Ollama maintainers.
The Issue
Generating chat responses with images has a rather nasty side effect in that, if a message has images, they will be overwritten by their Base64 encoding.
If a user passed an image path, it is undesirable to have this path overwritten by extremely verbose Base64-text.
The Cause
Most python users will pass messages as sequences of dictionaries. Dictionaries are inherently mutable object that are passed by reference, which is why the line https://github.com/ollama/ollama-python/blob/cb81f522b0f0035acbfeeed87b7902856bda501e/ollama/_client.py#L174-L175
also affects the user's list/tuple of messages.
The Solution
Obviously this can be avoided on the user's end by simply making a deep copy themselves before passing
messages
. But because this is a rather obscure issue that is never mentioned in the documentation, I recommend simply removing the possiblility of this side-effect.There are multiple possible solutions to this issue, but because the performance cost of making a deep copy is insignificant in comparison to the performance cost of LLM inference, I went with the easiest fix: simply making a deep copy of
messages
every timeollama.chat
is called, but I am looking forward to hearing the opinions of the Ollama maintainers.