meta-llama / llama-stack

Composable building blocks to build Llama Apps
MIT License
4.45k stars 559 forks source link

Ollama 4.0 vision and llama-stack token Invalid token for decoding #367

Open JoseGuilherme1904 opened 2 days ago

JoseGuilherme1904 commented 2 days ago

šŸš€ The feature, motivation and pitch

ollama vision is new: https://ollama.com/x/llama3.2-vision

providers: inference:

in lama_stack/providers/adapters/inference/ollama/ollama.py OLLAMA_SUPPORTED_MODELS = { "Llama3.1-8B-Instruct": "x/llama:latest", "Llama3.1-70B-Instruct": "llama3.1:70b-instruct-fp16", "Llama3.2-1B-Instruct": "llama3.2:1b-instruct-fp16", "Llama3.2-3B-Instruct": "llama3.2:3b-instruct-fp16", "Llama-Guard-3-8B": "llama-guard3:8b", "Llama-Guard-3-1B": "llama-guard3:1b", "Llama3.2-11B-Vision-Instruct": "x/llama:latest" }

Traceback (most recent call last): File "/home/guilherme/.local/lib/python3.10/site-packages/llama_stack/distribution/server/server.py", line 206, in sse_generator async for item in await event_gen: File "/home/guilherme/.local/lib/python3.10/site-packages/llama_stack/providers/impls/meta_reference/agents/agents.py", line 138, in _create_agent_turn_streaming async for event in agent.create_and_execute_turn(request): File "/home/guilherme/.local/lib/python3.10/site-packages/llama_stack/providers/impls/meta_reference/agents/agent_instance.py", line 179, in create_and_execute_turn async for chunk in self.run( File "/home/guilherme/.local/lib/python3.10/site-packages/llama_stack/providers/impls/meta_reference/agents/agent_instance.py", line 252, in run async for res in self._run( File "/home/guilherme/.local/lib/python3.10/site-packages/llama_stack/providers/impls/meta_reference/agents/agent_instance.py", line 427, in _run async for chunk in await self.inference_api.chat_completion( File "/home/guilherme/.local/lib/python3.10/site-packages/llama_stack/distribution/routers/routers.py", line 101, in return (chunk async for chunk in await provider.chat_completion(params)) File "/home/guilherme/.local/lib/python3.10/site-packages/llama_stack/providers/adapters/inference/ollama/ollama.py", line 215, in _stream_chat_completion params = self._get_params(request) File "/home/guilherme/.local/lib/python3.10/site-packages/llama_stack/providers/adapters/inference/ollama/ollama.py", line 190, in _get_params "prompt": chat_completion_request_to_prompt(request, self.formatter), File "/home/guilherme/.local/lib/python3.10/site-packages/llama_stack/providers/utils/inference/prompt_adapter.py", line 46, in chat_completion_request_to_prompt return formatter.tokenizer.decode(model_input.tokens) File "/home/guilherme/.local/lib/python3.10/site-packages/llama_models/llama3/api/tokenizer.py", line 190, in decode return self.model.decode(cast(List[int], t)) File "/home/guilherme/.local/lib/python3.10/site-packages/tiktoken/core.py", line 254, in decode return self._core_bpe.decode_bytes(tokens).decode("utf-8", errors=errors) KeyError: 'Invalid token for decoding: 128256'**

Alternatives

No response

Additional context

No response

ashwinb commented 1 day ago

yeah that's a special token we need to correctly filter out for vision models. Let me look into this.

ashwinb commented 1 day ago

See https://github.com/meta-llama/llama-stack/pull/376/files -- it works now

JoseGuilherme1904 commented 10 hours ago

Thank you very much for your help.

While testing the image, an error occurs. Which version of HTTPX are you using?

Thanks, Guilherme role='user' content=[ImageMedia(image=URL(uri='......mited to): couch, coffee table, fireplace, etc\n\nReturn results in the following format:\n{\n "description": 4 sentence architectural description of the image,\n "items": list of furniture items present in the image\n}\n\nRemember to only list furniture items you see in the image. Just suggest item names without any additional text or explanations.\nFor eg. "Couch" instead of "grey sectional couch"\n\nReturn JSON as suggested, Do not return any other text or explanations.\n'] context=None Traceback (most recent call last): File "/home/guilherme/angular/llama/llama-stack/llama_stack/distribution/server/server.py", line 212, in sse_generator async for item in await event_gen: File "/home/guilherme/angular/llama/llama-stack/llama_stack/providers/impls/meta_reference/agents/agents.py", line 138, in _create_agent_turn_streaming async for event in agent.create_and_execute_turn(request): File "/home/guilherme/angular/llama/llama-stack/llama_stack/providers/impls/meta_reference/agents/agent_instance.py", line 179, in create_and_execute_turn async for chunk in self.run( File "/home/guilherme/angular/llama/llama-stack/llama_stack/providers/impls/meta_reference/agents/agent_instance.py", line 252, in run async for res in self._run( File "/home/guilherme/angular/llama/llama-stack/llama_stack/providers/impls/meta_reference/agents/agent_instance.py", line 427, in _run async for chunk in await self.inference_api.chat_completion( File "/home/guilherme/angular/llama/llama-stack/llama_stack/distribution/routers/routers.py", line 101, in chat_completion return (chunk async for chunk in await provider.chat_completion(**params)) File "/home/guilherme/angular/llama/llama-stack/llama_stack/providers/impls/meta_reference/inference/inference.py", line 220, in chat_completion request = await request_with_localized_media(request) File "/home/guilherme/angular/llama/llama-stack/llama_stack/providers/impls/meta_reference/inference/inference.py", line 421, in request_with_localized_media m.content = await _convert_content(m.content) File "/home/guilherme/angular/llama/llama-stack/llama_stack/providers/impls/meta_reference/inference/inference.py", line 415, in _convert_content return [await _convert_single_content(c) for c in content] File "/home/guilherme/angular/llama/llama-stack/llama_stack/providers/impls/meta_reference/inference/inference.py", line 415, in return [await _convert_single_content(c) for c in content] File "/home/guilherme/angular/llama/llama-stack/llama_stack/providers/impls/meta_reference/inference/inference.py", line 408, in _convert_single_content url = await convert_image_media_to_url(content, download=True) File "/home/guilherme/angular/llama/llama-stack/llama_stack/providers/utils/inference/prompt_adapter.py", line 77, in convert_image_media_to_url r = await client.get(media.image.uri) File "/tmp/a/llama/conda/envs/llamastack-rede/lib/python3.10/site-packages/httpx/_client.py", line 1814, in get return await self.request( File "/tmp/a/llama/conda/envs/llamastack-rede/lib/python3.10/site-packages/httpx/_client.py", line 1585, in request return await self.send(request, auth=auth, follow_redirects=follow_redirects) File "/tmp/a/llama/conda/envs/llamastack-rede/lib/python3.10/site-packages/httpx/_client.py", line 1674, in send response = await self._send_handling_auth( File "/tmp/a/llama/conda/envs/llamastack-rede/lib/python3.10/site-packages/httpx/_client.py", line 1702, in _send_handling_auth response = await self._send_handling_redirects( File "/tmp/a/llama/conda/envs/llamastack-rede/lib/python3.10/site-packages/httpx/_client.py", line 1739, in _send_handling_redirects response = await self._send_single_request(request) File "/tmp/a/llama/conda/envs/llamastack-rede/lib/python3.10/site-packages/httpx/_client.py", line 1776, in _send_single_request response = await transport.handle_async_request(request) File "/tmp/a/llama/conda/envs/llamastack-rede/lib/python3.10/site-packages/httpx/_transports/default.py", line 377, in handle_async_request resp = await self._pool.handle_async_request(req) File "/home/guilherme/.local/lib/python3.10/site-packages/httpcore/_async/connection_pool.py", line 216, in handle_async_request raise exc from None File "/home/guilherme/.local/lib/python3.10/site-packages/httpcore/_async/connection_pool.py", line 188, in handle_async_request closing = self._assign_requests_to_connections() File "/home/guilherme/.local/lib/python3.10/site-packages/httpcore/_async/connection_pool.py", line 264, in _assign_requests_to_connections origin = pool_request.request.url.origin File "/home/guilherme/.local/lib/python3.10/site-packages/httpcore/_models.py", line 289, in origin default_port = { KeyError: b''

second error:

role='user' content=[ImageMedia(image=URL(uri='......mited to): couch, coffee table, fireplace, etc\n\nReturn results in the following format:\n{\n "description": 4 sentence architectural description of the image,\n "items": list of furniture items present in the image\n}\n\nRemember to only list furniture items you see in the image. Just suggest item names without any additional text or explanations.\nFor eg. "Couch" instead of "grey sectional couch"\n\nReturn JSON as suggested, Do not return any other text or explanations.\n'] context=None Traceback (most recent call last): File "/home/guilherme/angular/llama/llama-stack/llama_stack/distribution/server/server.py", line 212, in sse_generator async for item in await event_gen: File "/home/guilherme/angular/llama/llama-stack/llama_stack/providers/impls/meta_reference/agents/agents.py", line 138, in _create_agent_turn_streaming async for event in agent.create_and_execute_turn(request): File "/home/guilherme/angular/llama/llama-stack/llama_stack/providers/impls/meta_reference/agents/agent_instance.py", line 179, in create_and_execute_turn async for chunk in self.run( File "/home/guilherme/angular/llama/llama-stack/llama_stack/providers/impls/meta_reference/agents/agent_instance.py", line 252, in run async for res in self._run( File "/home/guilherme/angular/llama/llama-stack/llama_stack/providers/impls/meta_reference/agents/agent_instance.py", line 427, in _run async for chunk in await self.inference_api.chat_completion( File "/home/guilherme/angular/llama/llama-stack/llama_stack/distribution/routers/routers.py", line 101, in chat_completion return (chunk async for chunk in await provider.chat_completion(params)) File "/home/guilherme/angular/llama/llama-stack/llama_stack/providers/impls/meta_reference/inference/inference.py", line 220, in chat_completion request = await request_with_localized_media(request) File "/home/guilherme/angular/llama/llama-stack/llama_stack/providers/impls/meta_reference/inference/inference.py", line 421, in request_with_localized_media m.content = await _convert_content(m.content) File "/home/guilherme/angular/llama/llama-stack/llama_stack/providers/impls/meta_reference/inference/inference.py", line 415, in _convert_content return [await _convert_single_content(c) for c in content] File "/home/guilherme/angular/llama/llama-stack/llama_stack/providers/impls/meta_reference/inference/inference.py", line 415, in return [await _convert_single_content(c) for c in content] File "/home/guilherme/angular/llama/llama-stack/llama_stack/providers/impls/meta_reference/inference/inference.py", line 408, in _convert_single_content url = await convert_image_media_to_url(content, download=True) File "/home/guilherme/angular/llama/llama-stack/llama_stack/providers/utils/inference/prompt_adapter.py", line 77, in convert_image_media_to_url r = await client.get(media.image.uri) File "/tmp/a/llama/conda/envs/llamastack-rede/lib/python3.10/site-packages/httpx/_client.py", line 1814, in get return await self.request( File "/tmp/a/llama/conda/envs/llamastack-rede/lib/python3.10/site-packages/httpx/_client.py", line 1572, in request request = self.build_request( File "/tmp/a/llama/conda/envs/llamastack-rede/lib/python3.10/site-packages/httpx/_client.py", line 346, in build_request url = self._merge_url(url) File "/tmp/a/llama/conda/envs/llamastack-rede/lib/python3.10/site-packages/httpx/_client.py", line 376, in _merge_url merge_url = URL(url) File "/tmp/a/llama/conda/envs/llamastack-rede/lib/python3.10/site-packages/httpx/_urls.py", line 117, in init self._uri_reference = urlparse(url, kwargs) File "/tmp/a/llama/conda/envs/llamastack-rede/lib/python3.10/site-packages/httpx/_urlparse.py", line 158, in urlparse raise InvalidURL("URL too long") httpx.InvalidURL: URL too long

ashwinb commented 3 hours ago

@JoseGuilherme1904 can you tell me how to reproduce this failure?