[Proof of Concept][do not land] PR example of testing ordering or text vs. image on ChatGPT Vision API

TLDR: Ordering matters, but it's a little "obvious" and only when the prompt explicitly asks for it

The reason we investigated is because this has impact on how we prioritize dynamically overriding/passing in non-text data (which is related to the input and output adapter work I'm doing for AIConfig SDK 2.0)

Just a simple proof of concept to validate whether ordering matters between text and image inputs.

If you look at the last cell, where we explicitly set two different text inputs, they simply get spliced together and the results are output into two separate paragraphs in the "message" field.

Overall, this tells us that the ordering does seem to be "understood" by GPT Vision, but the splicing or different modalities (ex: text questions split up between images vs. merged together as single input) do not change most of the response.

Summary:

Test Plan:

lastmile-ai / aiconfig

[Proof of Concept][do not land] PR example of testing ordering or text vs. image on ChatGPT Vision API #1460