huggingface / optimum

🚀 Accelerate training and inference of 🤗 Transformers and 🤗 Diffusers with easy to use hardware optimization tools
https://huggingface.co/docs/optimum/main/
Apache License 2.0
2.21k stars 375 forks source link

Output names magic in recent optimum for onnx export #1842

Closed jobergum closed 3 weeks ago

jobergum commented 3 weeks ago

System Info

Name: transformers
Version: 4.40.1

Name: optimum
Version: 1.19.1

At some point in time the following export changed from a single output with the name last_hidden_state to output two outputs token_embeddings and sentence_embedding, the latter likely implements the pooling inside onnx?

optimum-cli export onnx --task feature-extraction -m intfloat/multilingual-e5-small model-dir

***** Exporting submodel 1/1: SentenceTransformer *****
Using framework PyTorch: 2.0.0
Overriding 1 configuration item(s)
    - use_cache -> False
================ Diagnostic Run torch.onnx.export version 2.0.0 ================
verbose: False, log level: Level.ERROR
======================= 0 NONE 0 NOTE 0 WARNING 0 ERROR ========================

Post-processing the exported models...
Weight deduplication check in the ONNX export requires accelerate. Please install accelerate to run it.

Validating ONNX model model-dir/model.onnx...
    -[✓] ONNX model output names match reference model (token_embeddings, sentence_embedding)
    - Validating ONNX Model output "token_embeddings":
        -[✓] (2, 16, 384) matches (2, 16, 384)
        -[✓] all values close (atol: 1e-05)
    - Validating ONNX Model output "sentence_embedding":
        -[✓] (2, 384) matches (2, 384)
        -[✓] all values close (atol: 1e-05)
The ONNX export succeeded and the exported model was saved at: model-dir

How can one not do this magic related to sentence-transformers? All the @xenova onnx models on the hub are not using this specific output format. This change in behaviour causing some issues for us at Vespa because some blog posts mentions using the optimum export utility in the above way, but when importing these to vespa for embedding inference, they fail because the output is not what Vespa expects by default which is last_hidden_state and where the pooling is implemented outside of ONNX.

Expected behavior

Be able to restore the previous behaviour with output last_hidden_state and where pooling can be implemented outside of onnx.

fxmarty commented 3 weeks ago

@jobergum Thank you for the report, this behavior was changed for all models labeled as sentence_transformers models on the Hub: https://huggingface.co/models?library=sentence-transformers, where the automatic library detection now picks sentence_transformers (you can see the logic here https://github.com/huggingface/optimum/blob/e3fd2776a318a3a7b9d33315cc42c04c181f6d2f/optimum/exporters/tasks.py#L1690). When using the command line, can you try passing --library-name transformers to export with the previous last_hidden_state output?

Apology for the somewhat breaking change.

  --library-name {transformers,diffusers,timm,sentence_transformers}
                        The library on the model. If not provided, will attempt to infer the local checkpoint's library
jobergum commented 3 weeks ago

Perfect! Thank you so much for the swift reply!

optimum-cli export onnx --library transformers --task feature-extraction -m intfloat/multilingual-e5-small

does the trick. Will update our resources pointing to optimum-cli for exporting!