add ONNX support for SaT models

segment-any-text / wtpsplit

Toolkit to segment text into sentences or other semantic units in a robust, efficient and adaptable way.

MIT License

753 stars 44 forks source link

add ONNX support for SaT models #129

Closed markus583 closed 2 months ago

markus583 commented 2 months ago

This adds ONNX support for sat, sat-sm, and sat-lora models, and includes documentation and testing.

TODOs:

Use correct timings in the README. Current ones for ONNX are incorrect because I can't access a device with onnxruntime-gpu support. Can you @bminixhofer?
Is it fine to use model_optimized.onnx?
[WIP] ONNX weights upload to HF Hub for easy loading.
Why do we need to reverse the input_names in extract.py? It works but it is quite weird.

markus583 commented 2 months ago

I had to remove the exact timings. ORT on GPU just doesn't work on any of my systems, even with extensive trial and error. If you're fine with it @bminixhofer, we can merge!

bminixhofer commented 2 months ago

The naming of the inputs in the ONNX model was swapped (the attention_mask input was named input_ids and vice versa). It still worked because the arguments in the call in extract.py were swapped again, but I removed both swaps now (one by changing arg order in the export).

Also added timings for onnxruntime on GPU, it is indeed ~50% faster!

LGTM now, @markus583 maybe take a final look then we can merge and release.