huggingface / optimum-neuron

Easy, fast and very cheap training and inference on AWS Trainium and Inferentia chips.
Apache License 2.0
176 stars 51 forks source link

[Inference] Add `SentenceTransformers` support to `pipeline` for `feature-extration` #583

Closed philschmid closed 1 month ago

philschmid commented 2 months ago

What does this PR do?

This PR adds a new, slightly modified FeatureExtractionPipeline from Transformers that allows us to use it with sentence-transformers models. When using the pipeline object from optimum,, the library checks if the requested model for feature-extraction is a sentence-transformers model and if so, it would return the sentence_embeddings instead of the first hidden state.

Thats is done by adding a new is_sentence_transformer_model that checks if the requested model is a transformers or sentence-transformers model. If it is a sentence-transformers model, it uses NeuronModelForSentenceTransformers and the FeatureExtractionPipeline returns _model_outputs.sentence_embedding[0] instead of model_outputs[0]

Example:

from optimum.neuron import pipeline

input_shapes = {"batch_size": 1, "sequence_length": 64} 
p = pipeline("feature-extraction","sentence-transformers/all-MiniLM-L6-v2",export=True, input_shapes=input_shapes)
> # Using Sentence Transformers compatible Feature extraction pipeline

p("test")
> [0.06765521317720413,
> 0.06349243223667145,
> 0.04871273413300514,
> 0.0793028473854065,

Validated with torch.allclose

Implications:

Alternatives options:

philschmid commented 2 months ago

@tomaarsen can you also do a review?

HuggingFaceDocBuilderDev commented 2 months ago

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.