Open andrew-lastmile opened 7 months ago
Looks like the way images get specified to the vision model is either specifying a URI accessible from the public internet, or base64-encoding it in the message (see https://platform.openai.com/docs/guides/vision/quick-start). The AIConfig V1 schema supports multi-modal inputs, so this should be doable relatively soon.
OpenAI just announced GPT-4 Turbo and Turbo with Vision API access. AI Config default model parser should support these new models.
gpt-4-1106-preview
gpt-4-vision-preview