Closed jobergum closed 3 weeks ago
@jobergum Thank you for the report, this behavior was changed for all models labeled as sentence_transformers models on the Hub: https://huggingface.co/models?library=sentence-transformers, where the automatic library detection now picks sentence_transformers (you can see the logic here https://github.com/huggingface/optimum/blob/e3fd2776a318a3a7b9d33315cc42c04c181f6d2f/optimum/exporters/tasks.py#L1690). When using the command line, can you try passing --library-name transformers
to export with the previous last_hidden_state
output?
Apology for the somewhat breaking change.
--library-name {transformers,diffusers,timm,sentence_transformers}
The library on the model. If not provided, will attempt to infer the local checkpoint's library
Perfect! Thank you so much for the swift reply!
optimum-cli export onnx --library transformers --task feature-extraction -m intfloat/multilingual-e5-small
does the trick. Will update our resources pointing to optimum-cli for exporting!
System Info
At some point in time the following export changed from a single output with the name
last_hidden_state
to output two outputstoken_embeddings
andsentence_embedding
, the latter likely implements the pooling inside onnx?How can one not do this magic related to sentence-transformers? All the @xenova onnx models on the hub are not using this specific output format. This change in behaviour causing some issues for us at Vespa because some blog posts mentions using the optimum export utility in the above way, but when importing these to vespa for embedding inference, they fail because the output is not what Vespa expects by default which is last_hidden_state and where the pooling is implemented outside of ONNX.
Expected behavior
Be able to restore the previous behaviour with output
last_hidden_state
and where pooling can be implemented outside of onnx.