Use with opensource models

jasonmhead commented 1 year ago

What would it take to use this repo with say GPT-J, OPT or other opensource models?

What customizations would have to be done?

nkeilar commented 1 year ago

Given the amount of data we have it would seem to make sense to use a local model for embedding vectors and information retrieval. This would also make us more comfortable in terms of indexing PII data as it would remain onsite and could be redacted/sanatised at the point of passing to a LLM

nkeilar commented 1 year ago

I researched this further yesterday and came up with this blog post, which indicates there could be minimal / no degredation in performance if using some of the other local models

https://medium.com/@nils_reimers/openai-gpt-3-text-embeddings-really-a-new-state-of-the-art-in-dense-text-embeddings-6571fe3ec9d9

Which led me onto here:

https://www.sbert.net/docs/pretrained_models.html

Someone made a similar change to get Koren sentence embedding here, but its not sane for rolling into the project:

https://github.com/openai/chatgpt-retrieval-plugin/compare/main...SeaJungg:chatgpt-retrieval-plugin:ko-embedding

Does OpenAI have any problems with using a local model, would such a Pull Request ever make it into the project as presumably it would deprive OpenAI of revenue?

openai / chatgpt-retrieval-plugin

Use with opensource models #56