sentient-engineering / sentient

the framework/ sdk that lets you build browser controlling agents in 3 lines of code. join chat @ https://discord.gg/umgnyQU2K8
MIT License
402 stars 48 forks source link

Vision Support #8

Open nischalj10 opened 3 weeks ago

nischalj10 commented 3 weeks ago

Add support for vision models/ passing in screenshots for tasks that need visual information extraction/ verification.

adithya-s-k commented 3 weeks ago

This is interesting , was working on something similar would like to see how i can contribute local models or using hosted API such as vision apis form GCP , Azure etc

nischalj10 commented 3 weeks ago

nice, btw all open ai compatible servers are supported already. checkout the the custom api server section in the README.

what kind of help can I do if you wanna take this one up @adithya-s-k?