run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
33.32k stars 4.66k forks source link

[Feature Request]: Change OpenAI default embedding model from "text-embedding-ada-002" to "text-embedding-3-small" #12994

Open dsanr opened 2 months ago

dsanr commented 2 months ago

Feature Description

text-embedding-3-small model is better and less costly than the text-embedding-ada-002 model. So, it is beneficial to make the former model default. https://openai.com/blog/new-embedding-models-and-api-updates https://openai.com/pricing

Reason

No response

Value of Feature

No response

logan-markewich commented 2 months ago

@dsanr this would be a giant breaking change. Probably this should be

a) saved for a larger version bump b) properly communicated ahead of time to users

dsanr commented 2 months ago

@logan-markewich They both have the same dimensionality of 1536. Are there any other reasons why this would be a giant breaking change?

justinzyw commented 2 months ago

I tried to replace adda with 3-small and found that they are not compatible even if the dimensionality is the same. ie, all users created indexes using default ada will find their queries behave quite differently using default 3-small.

logan-markewich commented 2 months ago

@justinzyw is correct. It's not the dimension that matters so much, they are trained on completely different data. Vectors created with Ada are in a completely different vector space compared to small-3

dsanr commented 2 months ago

@justinzyw Thanks for trying it out. @logan-markewich Yeah, in this case, we can only take up this in any next major release.