langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications
https://python.langchain.com
MIT License
95.19k stars 15.44k forks source link

openai[patch]: unskip test and relax tolerance in embeddings comparison #28262

Closed ccurme closed 23 hours ago

ccurme commented 1 day ago

From what I can tell response using SDK is not deterministic:

import numpy as np
import openai

documents = ["disallowed special token '<|endoftext|>'"]
model = "text-embedding-ada-002"

direct_output_1 = (
    openai.OpenAI()
    .embeddings.create(input=documents, model=model)
    .data[0]
    .embedding
)

for i in range(10):
    direct_output_2 = (
        openai.OpenAI()
        .embeddings.create(input=documents, model=model)
        .data[0]
        .embedding
    )
    print(f"{i}: {np.isclose(direct_output_1, direct_output_2).all()}")
0: True
1: True
2: True
3: True
4: False
5: True
6: True
7: True
8: True
9: True

See related discussion here: https://community.openai.com/t/can-text-embedding-ada-002-be-made-deterministic/318054

Found the same result using "text-embedding-3-small".

vercel[bot] commented 1 day ago

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment | Name | Status | Preview | Comments | Updated (UTC) | | :--- | :----- | :------ | :------- | :------ | | **langchain** | ⬜️ Ignored ([Inspect](https://vercel.com/langchain/langchain/BEGqDZqb5BfseFagMjTHMN6hhJzN)) | | | Nov 21, 2024 3:44pm |