Closed markus583 closed 2 months ago
I had to remove the exact timings. ORT on GPU just doesn't work on any of my systems, even with extensive trial and error. If you're fine with it @bminixhofer, we can merge!
The naming of the inputs in the ONNX model was swapped (the attention_mask
input was named input_ids
and vice versa). It still worked because the arguments in the call in extract.py
were swapped again, but I removed both swaps now (one by changing arg order in the export).
Also added timings for onnxruntime on GPU, it is indeed ~50% faster!
LGTM now, @markus583 maybe take a final look then we can merge and release.
This adds ONNX support for sat, sat-sm, and sat-lora models, and includes documentation and testing.
TODOs:
model_optimized.onnx
?extract.py
? It works but it is quite weird.