Open abeatrix opened 4 days ago
@abeatrix This is amazing even if the open-source vision models are benching a lower than GPT-4o or Sonnet 3.5. If, in the future, the SG endpoint will be compatible later with Sonnet 3.5 and other multi-modal models, this tops that all.
NOTE: This is a proof of concept on supporting multimodal in Cody. The UI will need to be updated to make it more obvious to users when an image has been uploaded for the current chat.
images
fieldMultimodal models from Ollama that we currently support: llava & bakllava
Demo
Loom: https://www.loom.com/share/4d2b72de1385436a960b0775434b933e
Follow-up works/ideas
Test plan
Example
Left image was shared with Cody, right image was built with the code provided by Cody: