Open giovannialbero1992 opened 4 weeks ago
I'd share with you a kind of guide to reproduce what I'm observing.
run a docker container with python 3.10.11
docker run -d -i -t python:3.10 bash
Enter in the docker container getting the container's id with docker ps
docker exec -ti <CONTAINER ID> bash
Install vim
apt update && apt install vim
Create a python's file embedder.py
and insert this code
from langchain_community.embeddings import FastEmbedEmbeddings
embedder = FastEmbedEmbeddings(model_name="intfloat/multilingual-e5-large")
text = "Hello world"
embedding = embedder.embed_query(text)
print(embedding)
Install dependencies
pip install langchain_core==0.1.22
pip install langchain==0.1.4
pip install fastembed==0.1.3
Run the script and get the result
python embedder.py
First part of the vector
[0.024819795042276382, -0.023618297651410103, -0.006692419294267893, -0.04708532989025116, 0.0343518927693367, -0.026183584704995155, -0.029025807976722717, 0.041693683713674545, 0.060204412788152695, -0.015606507658958435, 0.02012583799660206, 0.03693017736077309, ...
Upgrade the fastembed version
pip install fastembed==0.3.4
Run the script and get the result
python embedder.py
First part of the vector
[-0.005152239464223385, 0.005240725819021463, 0.008123699575662613, -0.039657339453697205, 0.009418696165084839, -0.035511959344148636, -0.04110070690512657, 0.03789035230875015, 0.05153501033782959, -0.024316389113664627, 0.037706244736909866, 0.019727017730474472, ...
Compare the result
Reproduced for me, but last output is:
[-0.0010747660417109728, -0.0015742044197395444, 0.01378690730780363, -0.03357434272766113, 0.0050786384381353855 ...
First output exactly matches
Yap, times change: You are looking at very early release
After release 0.2.0 the behavior stays as it's now. Please use some actual version of fastembed
Thanks @I8dNLo for the test. I don't know why you have different vector on the last output but you have a difference anyway.
I checked the code and I observed that in previous version you were prepending query:
before to embed the entire query.
Unfortunately the update it's disruptive on the RAG system that I've because I have different result. I should plan a migration in a way.
What happened?
I have two environments one with fastembed with the version 0.3.4 and another one with the version 0.1.3. The embedder used is: https://huggingface.co/intfloat/multilingual-e5-large
What Python version are you on? e.g. python --version
python 3.10.11
Version
0.2.7 (Latest)
What os are you seeing the problem on?
Linux
Relevant stack traces and/or logs
No response