Open JoseGuilherme1904 opened 2 days ago
yeah that's a special token we need to correctly filter out for vision models. Let me look into this.
See https://github.com/meta-llama/llama-stack/pull/376/files -- it works now
Thank you very much for your help.
While testing the image, an error occurs. Which version of HTTPX are you using?
Thanks,
Guilherme
role='user' content=[ImageMedia(image=URL(uri='...
second error:
role='user' content=[ImageMedia(image=URL(uri='...
@JoseGuilherme1904 can you tell me how to reproduce this failure?
š The feature, motivation and pitch
ollama vision is new: https://ollama.com/x/llama3.2-vision
providers: inference:
in lama_stack/providers/adapters/inference/ollama/ollama.py OLLAMA_SUPPORTED_MODELS = { "Llama3.1-8B-Instruct": "x/llama:latest", "Llama3.1-70B-Instruct": "llama3.1:70b-instruct-fp16", "Llama3.2-1B-Instruct": "llama3.2:1b-instruct-fp16", "Llama3.2-3B-Instruct": "llama3.2:3b-instruct-fp16", "Llama-Guard-3-8B": "llama-guard3:8b", "Llama-Guard-3-1B": "llama-guard3:1b", "Llama3.2-11B-Vision-Instruct": "x/llama:latest" }
Traceback (most recent call last): File "/home/guilherme/.local/lib/python3.10/site-packages/llama_stack/distribution/server/server.py", line 206, in sse_generator async for item in await event_gen: File "/home/guilherme/.local/lib/python3.10/site-packages/llama_stack/providers/impls/meta_reference/agents/agents.py", line 138, in _create_agent_turn_streaming async for event in agent.create_and_execute_turn(request): File "/home/guilherme/.local/lib/python3.10/site-packages/llama_stack/providers/impls/meta_reference/agents/agent_instance.py", line 179, in create_and_execute_turn async for chunk in self.run( File "/home/guilherme/.local/lib/python3.10/site-packages/llama_stack/providers/impls/meta_reference/agents/agent_instance.py", line 252, in run async for res in self._run( File "/home/guilherme/.local/lib/python3.10/site-packages/llama_stack/providers/impls/meta_reference/agents/agent_instance.py", line 427, in _run async for chunk in await self.inference_api.chat_completion( File "/home/guilherme/.local/lib/python3.10/site-packages/llama_stack/distribution/routers/routers.py", line 101, in
return (chunk async for chunk in await provider.chat_completion(params))
File "/home/guilherme/.local/lib/python3.10/site-packages/llama_stack/providers/adapters/inference/ollama/ollama.py", line 215, in _stream_chat_completion
params = self._get_params(request)
File "/home/guilherme/.local/lib/python3.10/site-packages/llama_stack/providers/adapters/inference/ollama/ollama.py", line 190, in _get_params
"prompt": chat_completion_request_to_prompt(request, self.formatter),
File "/home/guilherme/.local/lib/python3.10/site-packages/llama_stack/providers/utils/inference/prompt_adapter.py", line 46, in chat_completion_request_to_prompt
return formatter.tokenizer.decode(model_input.tokens)
File "/home/guilherme/.local/lib/python3.10/site-packages/llama_models/llama3/api/tokenizer.py", line 190, in decode
return self.model.decode(cast(List[int], t))
File "/home/guilherme/.local/lib/python3.10/site-packages/tiktoken/core.py", line 254, in decode
return self._core_bpe.decode_bytes(tokens).decode("utf-8", errors=errors)
KeyError: 'Invalid token for decoding: 128256'**
Alternatives
No response
Additional context
No response