pathwaycom / pathway

Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
https://pathway.com
Other
2.84k stars 98 forks source link

How to Integrate Replicate API using LiteLLMEmbedder and LiteLLMChat #46

Closed abdul756 closed 1 month ago

abdul756 commented 2 months ago

I have integrate a replicate model using LiteLLMEmbedder to get the embedding of text but am getting error

home/abdul/project_pathway/personal_assistant/app_1.py:37: RuntimeWarning: coroutine 'LiteLLMEmbedder.__wrapped__' was never awaited embedding_dimension: int = len( Traceback (most recent call last): File "/home/abdul/project_pathway/personal_assistant/app_1.py", line 37, in <module> embedding_dimension: int = len( TypeError: object of type 'coroutine' has no len() My code

embedder = embedders.LiteLLMEmbedder(
    capacity = 5,
    model = "lucataco/nomic-embed-text-v1 ",
    api_base = "https://api.replicate.com/v1/predictions",
    api_key="r8_2HleIfYKC6yE62t4m1Au7AHPv8LV0FN4SVHuq",
)

embedding_dimension: int = len(
    embedder.__wrapped__("."))

print("Embedding dimension:", embedding_dimension)

I would like to see an example for integrating the model from replicate using API and I would like to understand the cause of the issue.

janchorowski commented 2 months ago

Thanks for the error report, we are investigating the async call issue and will send a correction shortly.

szymondudycz commented 1 month ago

The error is caused by some wrappers being synchronous and some asynchronous. To remove the need of adapting the code to different wrappers, in version 0.11.0 of Pathway, there was introduced a new method for each embedder called get_embedding_dimension, which takes care of running either synchronous or asynchronous code as needed. See here for documentation: https://pathway.com/developers/api-docs/pathway-xpacks-llm/embedders#pathway.xpacks.llm.embedders.LiteLLMEmbedder.get_embedding_dimension.

With the new method your code should now be:

embedder = embedders.LiteLLMEmbedder(
    capacity = 5,
    model = "lucataco/nomic-embed-text-v1 ",
    api_base = "https://api.replicate.com/v1/predictions",
    api_key="r8_2HleIfYKC6yE62t4m1Au7AHPv8LV0FN4SVHuq",
)

embedding_dimension: int = embedder.get_embedding_dimension()
print("Embedding dimension:", embedding_dimension)