Closed mostaphaRoudsari closed 4 months ago
This library transformers.js.py
doesn't support SentenceTransformer directly as it just proxies function calls from Python to transformers.js
and SentenceTransformer is built on top of transformers
, not transformers.js
.
However SentenceTransformer uses transformers
in its internals so you can create encode()
using transformers.js.py
referring to the original implementation.
Here is a sample I created referring to https://github.com/UKPLab/sentence-transformers/blob/c0fc0e8238f7f48a1e92dc90f6f96c86f69f1e02/sentence_transformers/SentenceTransformer.py#L405 though it doesn't fully implement the original method.
import scipy
from transformers_js_py import pipeline, AutoModel, AutoTokenizer
# Re-implement encode() using `transformers.js.py`
model_name = "sentence-transformers/all-MiniLM-L6-v2"
options = {
"quantized": False
}
tokenizer = await AutoTokenizer.from_pretrained(model_name, options);
model = await AutoModel.from_pretrained(model_name, options);
async def encode(sentences, output_value="sentence_embedding"):
model_inputs = tokenizer(sentences);
embeddings = await model(**model_inputs);
output = embeddings[output_value].to_numpy()
output = output[0] # Make it 1-D
return output
# Modify its callers as well:
async def get_embedding(programs):
return [await encode(p) for p in programs]
async def calculate_similarities(room_name, _programs_embedding):
input_embedding = await encode(room_name)
ranking = {
count: 1 - scipy.spatial.distance.cosine(input_embedding, program)
for count, program in enumerate(_programs_embedding)
}
data = list(sorted(ranking.items(), key=lambda item: item[1], reverse=True))
return data[0]
programs = ['I love Transformers', 'It was raining yesterday']
pe = await get_embedding(programs)
index, uncertainty = await calculate_similarities('What was the weather?', pe)
assert programs[index] == 'It was raining yesterday'
Thank you, @whitphx for taking the time to provide a working example. This should do what I need, and I can have a closer look into the original implementation. I did a couple of quick tests, and it worked very well. Cheers.
Hi @whitphx, this is not really an issue! I didn't know where to post this question so I'm creating an issue for it.
I'm trying to translate this code into
transformers.js.py
to be able to run it with pyodide but I'm struggling to map the code from thesentence_transformers
library.Two questions:
sentence_transformers
also accessible through this library?Thanks.