theroyallab / tabbyAPI

An OAI compatible exllamav2 API that's both lightweight and fast
GNU Affero General Public License v3.0
609 stars 75 forks source link

[REQUEST] Vision Models #235

Open bdashore3 opened 1 week ago

bdashore3 commented 1 week ago

Problem

Tracking issue for getting vision models working, supersedes #229.

TImeline:

Solution

Contributions are welcome here as I am not sure how vision prompting works in the first place and that'll require research and a lot of time.

Please PR to the vision branch

Alternatives

No response

Explanation

Tracking issue

Examples

No response

Additional context

No response

Acknowledgements

Ph0rk0z commented 6 days ago

Can I request text completion vision support too? Chat completion is much more difficult to control.

bdashore3 commented 5 days ago

Code-wise, text completion with vision is not possible since chat completion separates images and text into defined payloads with roles. If there's examples proving otherwise, I can look into it.

turboderp commented 5 days ago

This should be possible in theory, since images in Llava-style models are inserted into the context as essentially tokens, and the way it's implemented in ExLlama is flexible enough to allow it. All it would need is an extension to the completion API to accept images with a placeholder text for each image.

But I don't think there are any vision models trained on image inputs outside the context of instruct tuning. So I wouldn't expect reliable results.

DocShotgun commented 4 days ago

Looking at implementing the vision support right now. It's definitely theoretically possible to implement exllamav2 vision on the text completion endpoint, but I'm not aware of any API standard defining how to format such things.

If you had some kind of standardized string that specifies an image URL or base64 image in the prompt, in theory we could find all of those, create embedding/text alias pairs, and feed them into the generator the same way as a formatted chat completion prompt.

Ph0rk0z commented 3 days ago

I think it hasn't been established and you have more or less free reign. Hopefully SillyTavern would implement such an addition to Tabby.