Closed simonw closed 2 days ago
In mistral_models.json
:
{
"id": "pixtral-12b-2409",
"object": "model",
"created": 1729104658,
"owned_by": "mistralai",
"name": "pixtral-12b-2409",
"description": "Official pixtral-12b-2409 Mistral AI model",
"max_context_length": 131072,
"aliases": [
"pixtral-12b",
"pixtral-12b-latest"
],
"deprecation": null,
"capabilities": {
"completion_chat": true,
"completion_fim": false,
"function_calling": true,
"fine_tuning": false,
"vision": true
},
"type": "base"
}
So we can look for "capabilities": {"vision": true}
Got it working, it's interesting though:
llm -m pixtral-12b 'return just the text' -a ../llm/example.jpg
Example handwriting
Let's try this out
It's quite varied in its response:
% llm -m pixtral-12b 'ocr' -a ../llm/example.jpg
Example handwriting Let's try this out
% llm -m pixtral-12b 'ocr' -a ../llm/example.jpg
Certainly! Here is the OCR (Optical Character Recognition) output for the image provided:
---
**Example handwriting**
Let's try this out
---
% llm -m pixtral-12b 'ocr' -a ../llm/example.jpg
Sure, here's an example of handwriting:
**Example handwriting**
Let's try this out
And I got this at one point, with a system prompt:
llm -m pixtral-12b 'ocr' -a https://static.simonwillison.net/static/2024/example-handwriting.jpg --system 'return just the text'
```python
{
"ocr": [
{
"text": "Example handwriting",
"bounding_box": {
"top": 52.72,
"left": 153.68,
"width": 150.64,
"height": 36.88
}
},
{
"text": "Let's try this out",
"bounding_box": {
"top": 99.76,
"left": 149.76,
"width": 145.48,
"height": 36.88
}
}
]
}
Surprising error:
llm -m pixtral-12b 'what species is this?' -a ../llm/demo-pics/cat.jpeg
Error: 500: invalid_request_error - Image data:image/jpeg;base64,/9j/4AA... has an invalid format.Allowed formats are JPEG,PNG,WEBP,GIF.
It's a real JPEG though.
OK, not sure why but I think this is a Pixtral bug:
llm -m pixtral-12b 'describe' -a https://static.simonwillison.net/static/2024/rocks.jpeg
Error: 500: invalid_request_error - Image https://static.simonwillison.n has an invalid format.Allowed formats are JPEG,PNG,WEBP,GIF.
https://static.simonwillison.net/static/2024/rocks.jpeg
But it's a valid JPEG. And this one works: https://static.simonwillison.net/static/2024/earth.jpg
llm -m pixtral-12b 'describe' -a https://static.simonwillison.net/static/2024/earth.jpg
The image shows a large screen displaying an educational interface related to Earth and its surface composition. The screen features a prominent image of the Earth, highlighting the continents and oceans. To the right of the Earth image, there is a pie chart illustrating the composition of Earth's surface, with sections labeled "Land," "Water," and "Ice." The interface at the top of the screen appears to be a browser or software dashboard with tabs open, such as "Earth" and "Surface Composition."
The background of the interface includes additional tabs, likely representing other topics or sections within the educational software. The environment surrounding the screen seems to be a modern indoor setting, possibly an educational facility or museum, given the high-quality visual display and the structured layout. There are also some plants visible at the bottom of the image, contributing to the indoor aesthetics.
Reported it as an issue:
Conversation works:
% llm -m pixtral-12b 'describe this image in three words' -a https://static.simonwillison.net/static/2024/earth.jpg
Screen displaying Earth's surface composition
% llm -c 'now more detail'
The image shows a large screen displaying an interactive educational interface about Earth's surface composition. The screen features a detailed visualization of the Earth with clearly marked continents and oceans in natural colors. Adjacent to the Earth image is a pie chart illustrating the percentages of different components of Earth's surface. The interface at the top of the screen includes several tabs labeled with topics such as "Solar System," "Galaxy," "Neutron Stars," "Black Holes," and "Star Systems," indicating a broader range of accessible scientific content. The screen is mounted on a wall with a wooden slatted design at the bottom, and there are some green plants visible on the sides.
It did hallucinate the tabs though.
https://docs.mistral.ai/capabilities/vision/
Using: