Open jstayco opened 1 month ago
IMage recognition works in chat but not in llava API I get similar responses or confabulation I use this query (and others) but image recognition is not working image_path = f'./redbus.png'
def encode_image(image_path): with open(image_path, "rb") as image_file: return base64.b64encode(image_file.read()).decode('utf-8')
base64_image = encode_image(image_path)
completion = client.chat.completions.create( model="local-model", # not used messages=[ { "role": "user", "content": [ {"type": "text", "text": "What’s in this image?"}, { "type": "image_url", "image_url": { "url": f"data:image/png;base64,{base64_image}" }, }, ], } ], max_tokens=1000, stream=True )
When loading an image into a vision model, the LLM will reply with some version of "there is no image" or "As an AI, I can only work with text". This happens even though the model has vision capabilities and LM Studio correctly detected that and surfaced the image upload button (not the clip/attachment button).
I am seeing this behavior no matter what my system prompt is (even empty). It also surfaces for both MLX and GGUF filetypes.
LM Studio version: 0.3.4 Hardware: 14" M3 Max 128Gb RAM
Example image I've been using:
Examples of output:
Please note the system prompt in this image was simply a test to try to force it after an empty system prompt did not work.