Closed Maxzurek closed 8 months ago
Hi there 👋 Due to the differences in how JavaScript and Python handle optional positional and keyword arguments, we modified the API slightly to account for this. See here for example usage:
import { AutoTokenizer, AutoModelForSequenceClassification } from '@xenova/transformers';
const model = await AutoModelForSequenceClassification.from_pretrained('Xenova/ms-marco-TinyBERT-L-2-v2');
const tokenizer = await AutoTokenizer.from_pretrained('Xenova/ms-marco-TinyBERT-L-2-v2');
const features = tokenizer(
['How many people live in Berlin?', 'How many people live in Berlin?'],
{
text_pair: [
'Berlin has a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers.',
'New York City is famous for the Metropolitan Museum of Art.',
],
padding: true,
truncation: true,
}
)
const scores = await model(features)
console.log(scores);
// quantized: [ 7.210887908935547, -11.559350967407227 ]
// unquantized: [ 7.235750675201416, -11.562294006347656 ]
This is what is was looking for, although it doesn't seem to work for this specific model crossencoder-distilcamembert-mmarcoFR_ONNX (it does work for ms-marco-TinyBERT-L-2-v2 and all its MiniLM variants).
I will be using TinyBERT since it seems to be performing really well for my use case!
Just a quick question, is the example you gave me (or a similar one) available in the documentation? If not I think that would be a great addition since almost every cross-encoders need to tokenize text pairs.
Thank you for this great library by the way :heart:
@Maxzurek I guess I had the same issue than you when using https://huggingface.co/Oblix/crossencoder-distilcamembert-mmarcoFR_ONNX (logits
property is returned undefined).
But just curious if you know why: what could explain this model cannot be used? Is it something related to the initial model https://huggingface.co/antoinelouis/crossencoder-distilcamembert-mmarcoFR when built ? Or is Transformers.js not yet taking into account all cases for models?
By the way @xenova , thanks for porting ML stuff to the JS/TS environment, it's very valuable!
@sneko It might be because the model is not yet supported by Transformers.js, but @xenova might be able to answer better. I've had good results so far with xenova/ms-marco-TinyBERT-L-2-v2, although I don't think the model is multilingual (I use it mostly for English and French).
System Info
transformer.js version: 2.14.0 Framework: React (18.2.0) Browser: Chrome (120.0.6099.218) Node.js version: 20.2.0
Environment/Platform
Description
I am attempting to use the crossencoder-distilcamembert-mmarcoFR model as a re-ranker in React, following the provided model card and inference code. The original Python code using the transformers library functions as expected, but when translating it to React using transformer.js, I encounter a TypeError related to the PreTrainedTokenizer constructor.
Model Information
Python code from the model card
Reproduction
Code Snippet
Error