supabase-community / chatgpt-your-files

Production-ready MVP for securely chatting with your documents using pgvector
https://youtu.be/ibzlEQmgPPY
365 stars 126 forks source link

Use multilingual embeddings model #27

Closed cheemaa closed 5 months ago

cheemaa commented 7 months ago

Feature request

Is your feature request related to a problem? Please describe.

gte-small only supports embeddings for English texts.

Describe the solution you'd like

Use a multilingual embeddings model instead so that more languages can be supported.

clementpeleman commented 5 months ago

I would like to have this feature added as well.

gregnr commented 5 months ago

Thanks for the suggestion @cheemaa. This repo is designed to be the foundation for your AI project with the expectation that you'd modify the code as needed for your use case. Since it will be hard to cover everyone's scenario, my preference would be to keep this repo as simple as possible with less configurations out of the box.

More than happy to add notes/hints to the README though that can point you in the right direction. For example for multi-lingual embeddings you could swap out gte-small for OpenAI's latest embedding models through their API (which are multilingual), or use another third party API. PRs welcome!

Hope this makes sense. By the way if you're requesting that edge functions themselves support another model, the best place to submit that request would be on the edge-runtime repo.

Definitely feel free to comment if you have any other thoughts/questions about this.