jakobdylanc / llmcord

A Discord LLM chat bot that supports any OpenAI compatible API (OpenAI, xAI, Mistral, Groq, OpenRouter, ollama, LM Studio and more)
MIT License
353 stars 67 forks source link

Does ollama supports images? #29

Closed mann1x closed 5 months ago

mann1x commented 7 months ago

Is sending images to the bot supported using ollama?

This is what I get from the logs using llava model:

2024-04-10 21:25:09.161 INFO: Message received (user ID: 161792098901688320, attachments: 1, reply chain length: 1):
describe it
2024-04-10 21:25:09.480 INFO: HTTP Request: POST http://localhost:11434/api/generate "HTTP/1.1 400 Bad Request"
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/litellm/llms/ollama.py", line 270, in ollama_async_streaming
    raise OllamaError(
litellm.llms.ollama.OllamaError: b'{"error":"illegal base64 data at input byte 4"}'
2024-04-10 21:25:09.483 ERROR: Error while streaming response
Traceback (most recent call last):
  File "/root/discord-llm-chatbot/llmcord.py", line 176, in on_message
    async for curr_chunk in await acompletion(**kwargs):
  File "/usr/local/lib/python3.11/dist-packages/litellm/llms/ollama.py", line 284, in ollama_async_streaming
    raise e
  File "/usr/local/lib/python3.11/dist-packages/litellm/llms/ollama.py", line 270, in ollama_async_streaming
    raise OllamaError(
litellm.llms.ollama.OllamaError: b'{"error":"illegal base64 data at input byte 4"}'
jakobdylanc commented 7 months ago

Upon investigating it seems that ollama expects a slightly different format for the base64 image data. Ultimately I think ollama should fix this to better align with the OpenAI API format. I'll try to raise more awareness on this.

For now it should work if you manually change this line in your llmcord.py: https://github.com/jakobdylanc/discord-llm-chatbot/blob/4f6971032c9346caa332eeabfc9228b42e342138/llmcord.py#L117 to this:

"image_url": {"url": base64.b64encode(requests.get(att.url).content).decode('utf-8')},
mann1x commented 6 months ago

Upon investigating it seems that ollama expects a slightly different format for the base64 image data. Ultimately I think ollama should fix this to better align with the OpenAI API format. I'll try to raise more awareness on this.

I could make a PR to ollama to fix it. But I struggle to find where is the OpenAPI format... Do you have any reference?

I only find about the "Create image" but nothing about what is expected when sending images. https://platform.openai.com/docs/api-reference/images/createVariation

jakobdylanc commented 6 months ago

This is what you're looking for: https://platform.openai.com/docs/guides/vision

The issue is with the base64 data in the "url" field. OpenAI API expects, for example:

"url": f"data:image/jpeg;base64,{base64_image}"

But ollama expects JUST the base64 data without any prefix info:

"url": base64_image

Ollama's "default" API endpoints (/api/generate and /api/chat) are not OpenAI compatible: https://github.com/ollama/ollama/blob/main/docs/api.md

They're working on an OpenAI compatible endpoint (/v1/chat/completions) but it doesn't support vision yet: https://github.com/ollama/ollama/blob/main/docs/openai.md#supported-features

Once it supports vision, it should fix this problem.

mann1x commented 6 months ago

Ok so this is a hack to support the ollama way using the OpenAI interface How I could miss it... was literally the next chapter, sorry and thanks

jakobdylanc commented 6 months ago

No problem! Hopefully ollama addresses this soon. For now I'll keep this issue open for awareness.

Did you get it working with the code change I suggested before?

mann1x commented 6 months ago

Oh yes sure, works perfectly!

jakobdylanc commented 5 months ago

This issue should now be fixed with the latest version of litellm: pip install -U litellm

Ref: https://github.com/BerriAI/litellm/pull/2888

mann1x commented 5 months ago

@jakobdylanc

Can confirm it's working with ollama/llava-phi3. I just had to also execute pip install Pillow

jakobdylanc commented 5 months ago

@mann1x were you getting some kind of error before doing pip install Pillow? This sounds like another litellm issue. If litellm depends on Pillow shouldn't it install it for you when you do pip install -U litellm?

mann1x commented 5 months ago

@jakobdylanc Yes it was an error from litellm, not installed with the upgrade. Probably optional cause maybe only needed to process images


2024-05-29 16:11:37,798 ERROR: Error while streaming response
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/litellm/llms/ollama.py", line 129, in _convert_image
    from PIL import Image
ModuleNotFoundError: No module named 'PIL'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/litellm/main.py", line 2213, in completion
    generator = ollama.get_ollama_response(
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/litellm/llms/ollama.py", line 186, in get_ollama_response
    data["images"] = [_convert_image(image) for image in images]
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/litellm/llms/ollama.py", line 186, in <listcomp>
    data["images"] = [_convert_image(image) for image in images]
                      ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/litellm/llms/ollama.py", line 131, in _convert_image
    raise Exception(
Exception: ollama image conversion failed please run `pip install Pillow`
`
jakobdylanc commented 5 months ago

Ahh I see, thanks for sharing the error. It indeed looks like they intentionally made it optional. I found the code: https://github.com/BerriAI/litellm/blob/c44970c8134174a05b433b968ab8f46eee5a67a8/litellm/llms/ollama.py#L128-L133