huggingface / setfit

Efficient few-shot learning with Sentence Transformers
https://hf.co/docs/setfit
Apache License 2.0
2.18k stars 220 forks source link

Inconsistent onnx-optimum output #467

Open carloronsi opened 8 months ago

carloronsi commented 8 months ago

Hi, I am currently working on an optimization pipeline for a SetFit model.

As I'm testing different approaches, I've tried the process shown in the notebook: https://github.com/huggingface/setfit/blob/58a3600d2764e9c815594bec57812c9c2408931a/notebooks/setfit-onnx-optimum.ipynb

In this section:


!optimum-cli export onnx \
  --model moshew/bge-small-en-v1.5_setfit-sst2-english \
  --task feature-extraction \
  --optimize O4 \
  --device cuda \
  bge_auto_opt_O4

It shows:

- Validating ONNX Model output "last_hidden_state":
        -[✓] (2, 16, 384) matches (2, 16, 384)
        -[x] values not close enough, max diff: 2.1155929565429688 (atol: 0.0001)

But, If i try to replicate the same exact notebook, model output shown are "token_embeddings" and "sentence_embeddings". Does anyone has an explanation for this? Thanks

geraldstanje commented 4 months ago

@carloronsi it seems there are currently multiple ways to export to onnx:

can all this options used for gpu with cuda?

geraldstanje commented 4 months ago

@carloronsi i trained a setfit model with sentence-transformers/all-MiniLM-L6-v2, stored it in the model model_to_deploy directory and i get this output with optimum-cli export:

optimum-cli export onnx --model setfit-test-model --task feature-extraction --optimize O4 --device cuda setfit_auto_opt_O4
Framework not specified. Using pt to export the model.
Using the export variant default. Available variants are:
    - default: The default ONNX variant.

***** Exporting submodel 1/1: SentenceTransformer *****
Using framework PyTorch: 2.2.1+cu121
Overriding 1 configuration item(s)
        - use_cache -> False
2024-05-16 23:36:49.326953773 [W:onnxruntime:, transformer_memcpy.cc:74 ApplyImpl] 4 Memcpy nodes are added to the graph main_graph for CUDAExecutionProvider. It might have negative impact on performance (including unable to run CUDA graph). Set session_options.log_severity_level=1 to see the detail logs before this message.
2024-05-16 23:36:49.330704373 [W:onnxruntime:, session_state.cc:1166 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-05-16 23:36:49.330723887 [W:onnxruntime:, session_state.cc:1168 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
Overridding for_gpu=False to for_gpu=True as half precision is available only on GPU.
/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/optimum/onnxruntime/configuration.py:770: FutureWarning: disable_embed_layer_norm will be deprecated soon, use disable_embed_layer_norm_fusion instead, disable_embed_layer_norm_fusion is set to True.
  warnings.warn(
Optimizing model...
2024-05-16 23:36:51.063306577 [W:onnxruntime:, session_state.cc:1166 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-05-16 23:36:51.063330142 [W:onnxruntime:, session_state.cc:1168 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
symbolic shape inference disabled or failed.
symbolic shape inference disabled or failed.
Configuration saved in setfit_auto_opt_O4/ort_config.json
Optimized model saved at: setfit_auto_opt_O4 (external data format: False; saved all tensor to one file: True)
Post-processing the exported models...
Weight deduplication check in the ONNX export requires accelerate. Please install accelerate to run it.
Validating models in subprocesses...

Validating ONNX model setfit_auto_opt_O4/model.onnx...
2024-05-16 23:36:57.006898086 [W:onnxruntime:, session_state.cc:1166 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-05-16 23:36:57.006919714 [W:onnxruntime:, session_state.cc:1168 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
        -[✓] ONNX model output names match reference model (token_embeddings, sentence_embedding)
        - Validating ONNX Model output "token_embeddings":
                -[✓] (2, 16, 384) matches (2, 16, 384)
                -[x] values not close enough, max diff: 2.168553113937378 (atol: 1e-05)
        - Validating ONNX Model output "sentence_embedding":
                -[✓] (2, 384) matches (2, 384)
                -[x] values not close enough, max diff: 0.000601448118686676 (atol: 1e-05)
The ONNX export succeeded with the warning: The maximum absolute difference between the output of the reference model and the ONNX exported model is not within the set tolerance 1e-05:
- token_embeddings: max diff = 2.168553113937378
- sentence_embedding: max diff = 0.000601448118686676.
 The exported model was saved at: setfit_auto_opt_O4