Open jasonmhead opened 1 year ago
Given the amount of data we have it would seem to make sense to use a local model for embedding vectors and information retrieval. This would also make us more comfortable in terms of indexing PII data as it would remain onsite and could be redacted/sanatised at the point of passing to a LLM
I researched this further yesterday and came up with this blog post, which indicates there could be minimal / no degredation in performance if using some of the other local models
Which led me onto here:
https://www.sbert.net/docs/pretrained_models.html
Someone made a similar change to get Koren sentence embedding here, but its not sane for rolling into the project:
Does OpenAI have any problems with using a local model, would such a Pull Request ever make it into the project as presumably it would deprive OpenAI of revenue?
What would it take to use this repo with say GPT-J, OPT or other opensource models?
What customizations would have to be done?