Open kaihendry opened 5 months ago
@kaihendry Thanks for submitting this issue!
By default, the gpt4-v-vision
tool uses OpenAI's gpt-4-turbo
model (previously known as gpt-4-vision-preview
) to interpret images. Skimming through the OpenAI docs, I didn't see anything mentioning OCR-related limitations specifically, but I did find a community thread where folks were encountering similar issues. In that thread it looks like it has become increasingly difficult to get decent OCR results via OpenAI's API and model. At the moment, it's unclear to me what OpenAI's official level of support is for OCR
We always have the option of writing another vision tool for a non-OpenAI model if we can find one with better OCR support too.
In the meantime, when I get the chance I'll try to repro your issue.
Inspired by https://youtu.be/g3NtJatmQR0?t=133 I was hoped to turn the screenshot of
into structured JSON with the prompt:
However it doesn't work saying it essentially can't ready text from an image.