Erik262 commented 2 months ago

An image is telling more than 1000 words :)

jonathanfan-ee commented 1 week ago

Same question. GPT-4o和4o mini现在应该已经支持输入图片了吧，为什么在raycast这里还是显示不支持呢？另外，我发现其中模型列表中显示的模型上下文长度也有问题，4o 和 4o mini 现在应该已经是128K了吧，显示的还是8k GPT-4o and 4o mini should now support image input, right? Why does it still show as unsupported here in Raycast?Additionally, I found that there is an issue with the context length of the models displayed in the model list; 4o and 4o mini should now be 128K tokens, but it is still showing as 8K tokens.

jonathanfan-ee commented 1 week ago

我查看了原本官方的模型列表页面（ https://backend.raycast.com/api/v1/ai/models ），如下： I checked the original official model list page (https://backend.raycast.com/api/v1/ai/models), as follows: 这里模型的能力包含视觉相关的部分，而我们的代理程序里没有这个功能。我想我也许可以提交一个 Pull Request 来添加它。 The model here includes capabilities related to vision, but our agent program does not have this function. I think I can submit a Pull Request to add it.

jonathanfan-ee commented 1 week ago

哈哈，这真是个不小的工程。我原本以为只需要给模型添加 vision 属性就够了，结果发现添加了 vision 属性后，图片却无法上传。原来还需要对 /api/v1/ai/files 进行代理，才能解决文件上传问题。官方使用的是需要认证的 AWS S3 服务，于是我选择修改代码配置，使用免费额度比较多的 Cloudflare R2 存储来替代上传服务。现在上传已经没有问题了，但是还需要修改 OpenAI 的消息构建函数，以便将图片上传到我们自己的 API。 Haha, this is quite a project. I originally thought that adding the vision attribute to the model would be enough, but I found that after adding the vision attribute, the images couldn't be uploaded. It turns out that I also needed to set up a proxy for /api/v1/ai/files to resolve the file upload issue. The official service uses AWS S3, which requires authentication, so I chose to modify the code configuration to use Cloudflare R2 storage, which has a larger free quota, as a replacement for the upload service. Now the uploads are working fine, but I still need to modify OpenAI's message building function to upload images to our own API.

yufeikang commented 1 week ago

是的。这个比较复杂。我的branch修改了一半。最近比较忙还没顾上。感谢你的付出

jonathanfan-ee commented 1 week ago

yufeikang / raycast_api_proxy

No Support for vision? #46

51