lastmile-ai / aiconfig

AIConfig is a config-based framework to build generative AI applications.
https://aiconfig.lastmileai.dev
MIT License
896 stars 70 forks source link

OpenAI Model Parser support for GPT-4 Turbo with vision (image-to-text) #100

Open andrew-lastmile opened 7 months ago

andrew-lastmile commented 7 months ago

OpenAI just announced GPT-4 Turbo and Turbo with Vision API access. AI Config default model parser should support these new models.

Model Description Context Window Training Data
gpt-4-1106-preview GPT-4 Turbo: The latest GPT-4 model with improved instruction following, JSON mode, reproducible outputs, parallel function calling, and more. Returns a maximum of 4,096 output tokens. This preview model is not yet suited for production traffic. 128,000 tokens Up to Apr 2023
gpt-4-vision-preview GPT-4 Turbo with vision: Ability to understand images, in addition to all other GPT-4 Turbo capabilties. Returns a maximum of 4,096 output tokens. This is a preview model version and not suited yet for production traffic. 128,000 tokens Up to Apr 2023
saqadri commented 7 months ago

Looks like the way images get specified to the vision model is either specifying a URI accessible from the public internet, or base64-encoding it in the message (see https://platform.openai.com/docs/guides/vision/quick-start). The AIConfig V1 schema supports multi-modal inputs, so this should be doable relatively soon.