openai / chatgpt-retrieval-plugin

The ChatGPT Retrieval Plugin lets you easily find personal or work documents by asking questions in natural language.
MIT License
20.97k stars 3.68k forks source link

Use with opensource models #56

Open jasonmhead opened 1 year ago

jasonmhead commented 1 year ago

What would it take to use this repo with say GPT-J, OPT or other opensource models?

What customizations would have to be done?

nkeilar commented 1 year ago

Given the amount of data we have it would seem to make sense to use a local model for embedding vectors and information retrieval. This would also make us more comfortable in terms of indexing PII data as it would remain onsite and could be redacted/sanatised at the point of passing to a LLM

nkeilar commented 1 year ago

I researched this further yesterday and came up with this blog post, which indicates there could be minimal / no degredation in performance if using some of the other local models

https://medium.com/@nils_reimers/openai-gpt-3-text-embeddings-really-a-new-state-of-the-art-in-dense-text-embeddings-6571fe3ec9d9

Which led me onto here:

https://www.sbert.net/docs/pretrained_models.html

Someone made a similar change to get Koren sentence embedding here, but its not sane for rolling into the project:

https://github.com/openai/chatgpt-retrieval-plugin/compare/main...SeaJungg:chatgpt-retrieval-plugin:ko-embedding

Does OpenAI have any problems with using a local model, would such a Pull Request ever make it into the project as presumably it would deprive OpenAI of revenue?