openai[patch]: unskip test and relax tolerance in embeddings comparison

langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications

MIT License

95.19k stars 15.44k forks source link

From what I can tell response using SDK is not deterministic:

import numpy as np
import openai

documents = ["disallowed special token '<|endoftext|>'"]
model = "text-embedding-ada-002"

direct_output_1 = (
    openai.OpenAI()
    .embeddings.create(input=documents, model=model)
    .data[0]
    .embedding
)

for i in range(10):
    direct_output_2 = (
        openai.OpenAI()
        .embeddings.create(input=documents, model=model)
        .data[0]
        .embedding
    )
    print(f"{i}: {np.isclose(direct_output_1, direct_output_2).all()}")

0: True
1: True
2: True
3: True
4: False
5: True
6: True
7: True
8: True
9: True

Found the same result using "text-embedding-3-small".

langchain-ai / langchain

openai[patch]: unskip test and relax tolerance in embeddings comparison #28262