Open fenggaobj opened 3 days ago
I'm not sure if I am misunderstanding what is going on here, but this works just fine:
from langchain_ollama import OllamaEmbeddings
embeddings = OllamaEmbeddings(
base_url='http://localhost:11434', # optional
model='nomic-embed-text'
)
vector = embeddings.embed_query("hello")
print(vector[:3])
Of course, so long as you have your base_url
correct and have ollama with the nomic-embed-text
model pulled.
It's because ollama not support the data structure yet.
https://github.com/ollama/ollama/blob/main/docs/openai.md#v1embeddings
@ethanglide @wuyue92tree Thank you very much for your assistance. I am pleased to inform you that OllamaEmbeddings is functioning properly. However, I have encountered some issues with OpenAIEmbeddings.
The problem lies in the _get_len_safe_embeddings
method within the langchain_openai/embeddings/base.py
file. When this method calls the create
method in openai/resources/embeddings.py
, it provides a parameter of type List[Union[List[int], str]]
. Unfortunately, this type is not supported by the create
method. The supported types for the create method are Union[str, List[str], Iterable[int], Iterable[Iterable[int]]]
, excluding List[Union[List[int]].
Here is the implementation code for the create method in openai/resources/embeddings.py
:
class Embeddings(SyncAPIResource):
def create(
self,
*,
input: Union[str, List[str], Iterable[int], Iterable[Iterable[int]]], #excluding `List[Union[List[int]].` from _tokenize in the langchain code
model: Union[str, EmbeddingModel],
#.....................................
) -> CreateEmbeddingResponse:
And here is the implementation code for the _get_len_safe_embeddings
method in langchain_openai/embeddings/base.py
:
def _get_len_safe_embeddings(
self, texts: List[str], *, engine: str, chunk_size: Optional[int] = None
):
_chunk_size = chunk_size or self.chunk_size
_iter, tokens, indices = self._tokenize(texts, _chunk_size)
batched_embeddings: List[List[float]] = []
for i in _iter:
response = self.client.create(
input=tokens[i : i + _chunk_size], **self._invocation_params
)
#....................................
def _tokenize(
self, texts: List[str], chunk_size: int
) -> Tuple[Iterable[int], List[Union[List[int], str]], List[int]]:
#....................................
In this context, the tokens returned by self._tokenize
have a type of List[Union[List[int], str]]
.
I hope this detailed explanation helps in addressing the issue. Thank you once again for your help.
Checked other resources
Example Code
Error Message and Stack Trace (if applicable)
Description
The Embeddings.create method provided by OpenAI supports input parameters of type Union[str, List[str], Iterable[int], Iterable[Iterable[int]]]. However, in the langchain OpenAIEmbeddings class, the _get_len_safe_embeddings method uses _tokenize which may return a type of List[Union[List[int], str]]. This is not a supported type for Embeddings.create.
I believe this to be a bug. Could you please advise on how to handle this issue?
System Info
from langchain_core import sys_info sys_info.print_sys_info()
System Information
Package Information
Optional packages not installed
Other Dependencies