michaelfeil / infinity

Infinity is a high-throughput, low-latency REST API for serving text-embeddings, reranking models and clip
https://michaelfeil.github.io/infinity/
MIT License
1.26k stars 86 forks source link

Does InfinityEmbeddings server work with OpenAIEmbeddings properly? #329

Closed vmarchenkoff closed 1 month ago

vmarchenkoff commented 1 month ago

System Info

michaelf34/infinity:latest

Information

Tasks

Reproduction

bash docker run -it --gpus all -v ~/llms/:/app/.cache -p 8000:8000 michaelf34/infinity:latest v2 --model-id ~/llms/multilingual-e5-large --port 8000

from openai import OpenAI

openai_api_key = "DUMMY"
openai_api_base = "http://<BASE_URL>:8000"
client_emb = OpenAI(api_key=openai_api_key, base_url=openai_api_base)

models = client_emb.models.list()
model = models.data[0].id

responses = client_emb.embeddings.create(input=['phrase1', 'phrase2'], 
                                         model=model)
for data in responses.data:
    print(data.embedding[:5])

works just fine.

from langchain_community.embeddings import InfinityEmbeddings

infinity_api_url = "http://<BASE_URL>:8000"

embeddings = InfinityEmbeddings(
    model=model, infinity_api_url=infinity_api_url
)

query = "phrase1"

documents = ["phrase1", "phrase2"]

try:
    query_result = embeddings.embed_query(query)
    docs_result = embeddings.embed_documents(documents)
    print("embeddings created successful")
except Exception as ex:
    print(
        "Make sure the infinity instance is running. Verify by clicking on "
        f"{infinity_api_url.replace('v1','docs')} Exception: {ex}. "
    )

print(len(docs_result))
print(len(query_result))
print(query_result[:5])

works fine as well, but:

from langchain_openai import OpenAIEmbeddings

openai_api_key = "DUMMY"
openai_api_base = "http://<BASE_URL>:8000"

client_emb = OpenAIEmbeddings(api_key=openai_api_key,  base_url=openai_api_base)

client_emb.embed_query('Hello world')

fails with Time Out error. What did i miss here? The reason that i would like to use OpenAIEmbeddings instead of InfinityEmbeddings is possibility to use api-key which is not incorporated to langchain's InfinityEmbeddings.

Thank you in advance and thank you for this beautiful project!

Expected behavior

The same as for OpenAI client and InfinityEmbeddings.

greptile-apps[bot] commented 1 month ago

The issue seems to be related to the timeout error when using OpenAIEmbeddings with the Infinity server. Ensure that the Infinity server is properly configured to handle requests from OpenAIEmbeddings. Specifically, check the following:

  1. Server Configuration: Verify that the Infinity server is running and accessible at the specified openai_api_base URL.

  2. Timeout Settings: Increase the timeout settings in the OpenAIEmbeddings client to allow more time for the server to respond. You can do this by setting the timeout parameter when initializing the OpenAIEmbeddings client.

from langchain_openai import OpenAIEmbeddings

openai_api_key = "DUMMY"
openai_api_base = "http://<BASE_URL>:8000"

client_emb = OpenAIEmbeddings(api_key=openai_api_key, base_url=openai_api_base, timeout=60)

client_emb.embed_query('Всем привет')
  1. Server Logs: Check the server logs for any errors or warnings that might indicate why the request is timing out. This can provide more insight into whether the issue is with the server configuration or the client request.

  2. Network Issues: Ensure there are no network issues or firewalls blocking the request from the client to the server.

By addressing these points, you should be able to resolve the timeout error and use OpenAIEmbeddings with the Infinity server.

References

/README.md /docs/docs/index.md /docs/assets/openapi.json /docs/benchmarks/simple_app.py /libs/infinity_emb/tests/end_to_end/test_optimum_embedding.py /docs/docs/integrations.md /docs/docs /libs/infinity_emb /libs/infinity_emb/tests/end_to_end/test_authentication.py /docs/docs/deploy.md /docs /docs/docs/python_engine.md /libs/infinity_emb/tests/end_to_end/test_api_with_dummymodel.py /.github/ISSUE_TEMPLATE/bug-report.yml /libs/infinity_emb/tests/data/benchmark /libs/infinity_emb/infinity_emb/fastapi_schemas/convert.py /.github/ISSUE_TEMPLATE/new-model-addition.yml /libs /docs/benchmarks /docs/assets /.github/workflows/release.yaml /libs/infinity_emb/infinity_emb/engine.py /libs/infinity_emb/Makefile /libs/infinity_emb/infinity_emb/fastapi_schemas

#### About Greptile This response provides a starting point for your research, not a precise solution. Help us improve! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. [Ask Greptile](https://app.greptile.com/chat/github/michaelfeil/infinity/main) · [Edit Issue Bot Settings](https://app.greptile.com/apps/github)
vmarchenkoff commented 1 month ago

Guess it's because of OpenAI is trying to download something that cannot be downloaded (possibly because of network's settings)

~/.local/lib/python3.9/site-packages/tiktoken_ext/openai_public.py:72, in cl100k_base()
     71 def cl100k_base():
---> 72     mergeable_ranks = load_tiktoken_bpe(
     73         "https://openaipublic.blob.core.windows.net/encodings/cl100k_base.tiktoken",
     74         expected_hash="223921b76ee99bde995b7ff738513eef100fb51d18c93597a113bcffe865b2a7",
     75     )

Okay, i've realized that some params must be added:

tiktoken_enabled = False
model = model

but how it must be properly configurated? model-path from docker run command doesn't help.

michaelfeil commented 1 month ago

Search for related issues here before opening.

langchain pretokenizes the text with the tiktoken tokenizer

vmarchenkoff commented 1 month ago

As I understood, there is no way to use OpenAIEmbeddings with Infinity instance because of model's specific tokenization , InfinityEmbeddings was implemented to solve this (https://github.com/michaelfeil/infinity/issues/36). But the api-key feature is not implemented yet in langchain's InfinityEmbeddings.

Sorry for inconvenience, i saw the issue above before opening this one, but translated it to another problem.

Thank you for you answer and for your work in general, this is the beautiful project, very important and useful.

michaelfeil commented 1 month ago

@vmarchenkoff Makes sense! But the api-key feature is not implemented yet in langchain's InfinityEmbeddings. - Correct. I would recommend using https://github.com/michaelfeil/infinity/tree/main/libs/client_infinity/infinity_client / pip install infinity_client for usage.

vmarchenkoff commented 1 month ago

Thank you!

It looks much more customisable and serious, guess I have to use default clients for vector DB and Embeddings in my RAG project instead of third-party integrations.