openai / chatgpt-retrieval-plugin

The ChatGPT Retrieval Plugin lets you easily find personal or work documents by asking questions in natural language.
MIT License
21.02k stars 3.69k forks source link

`services.openai.get_embeddings` does not expose the `dimensions` kwarg of `openai.Embedding.create` #426

Closed caseyclements closed 4 months ago

caseyclements commented 7 months ago

Although EMBEDDING_DIMENSION is described as a required variable in the README , it is not used, except in a few of the datastore's setup.md instructions to create their vector indexes.

The README goes on to say "The plugin uses OpenAI's embeddings model (text-embedding-3-large 256 dimension embeddings by default)", but len(get_embeddings(["Some input text"])[0]) == 3072.

The reason that this is urgent to me is that I am soon to submit a PR to add MongoDB's Atlas as a new datastore. And though Atlas' imminent next release will increase support to a dimension of 4096, previous versions have 2048. i.e. less than 3072. If I understand correctly, this means that the following line is incorrect. "For example, if your vector database supports up to 1024 dimensions, you can use text-embedding-3-large and set the dimensions API parameter to 1024."

Making changes to support this is not difficult. If agreed, I would like to submit a PR to fix this issue. (Our MongoDB datastore addition as a separate PR.)

caseyclements commented 4 months ago

This issue has been resolved in PR #428.