For GPU, ONNX WtP model is around 2x slower than PyTorch.

import time
from wtpsplit import WtP

wtp = WtP("wtp-bert-mini", ort_providers=["CUDAExecutionProvider"])

def make_sentence(seg):
  sentences = wtp.split(seg, lang_code="en", style="ud", threshold=0.975)
  sentences = [x.strip() for x in sentences]
  return(sentences)

timelist_fox = []

for i in range(20):
  start = time.time()
  input_text = "The quick brown fox jumps over the lazy dog. El zorro marrón rápido salta sobre el perro perezoso. I went to see the p. t. barnum circus today!"
  sentences = make_sentence(input_text)
  end = time.time()
  print(sentences)
  print("Runtime for sentence segmentation", end - start)
  timelist_fox.append(end - start)

print()
# Get average runtime
print("Average runtime for sentence segmentation", sum(timelist_fox)/len(timelist_fox))

And I get

['The quick brown fox jumps over the lazy dog.', 'El zorro marrón rápido salta sobre el perro perezoso.', 'I went to see the p. t. barnum circus today!']
Runtime for sentence segmentation 0.09200406074523926
['The quick brown fox jumps over the lazy dog.', 'El zorro marrón rápido salta sobre el perro perezoso.', 'I went to see the p. t. barnum circus today!']
Runtime for sentence segmentation 0.15698647499084473
['The quick brown fox jumps over the lazy dog.', 'El zorro marrón rápido salta sobre el perro perezoso.', 'I went to see the p. t. barnum circus today!']
Runtime for sentence segmentation 0.07426166534423828
['The quick brown fox jumps over the lazy dog.', 'El zorro marrón rápido salta sobre el perro perezoso.', 'I went to see the p. t. barnum circus today!']
Runtime for sentence segmentation 0.07866954803466797
['The quick brown fox jumps over the lazy dog.', 'El zorro marrón rápido salta sobre el perro perezoso.', 'I went to see the p. t. barnum circus today!']
Runtime for sentence segmentation 0.09979438781738281
['The quick brown fox jumps over the lazy dog.', 'El zorro marrón rápido salta sobre el perro perezoso.', 'I went to see the p. t. barnum circus today!']
Runtime for sentence segmentation 0.08975934982299805
['The quick brown fox jumps over the lazy dog.', 'El zorro marrón rápido salta sobre el perro perezoso.', 'I went to see the p. t. barnum circus today!']
Runtime for sentence segmentation 0.09622359275817871
['The quick brown fox jumps over the lazy dog.', 'El zorro marrón rápido salta sobre el perro perezoso.', 'I went to see the p. t. barnum circus today!']
Runtime for sentence segmentation 0.09634947776794434
['The quick brown fox jumps over the lazy dog.', 'El zorro marrón rápido salta sobre el perro perezoso.', 'I went to see the p. t. barnum circus today!']
Runtime for sentence segmentation 0.07365036010742188
['The quick brown fox jumps over the lazy dog.', 'El zorro marrón rápido salta sobre el perro perezoso.', 'I went to see the p. t. barnum circus today!']
Runtime for sentence segmentation 0.10837149620056152
['The quick brown fox jumps over the lazy dog.', 'El zorro marrón rápido salta sobre el perro perezoso.', 'I went to see the p. t. barnum circus today!']
Runtime for sentence segmentation 0.0805506706237793
['The quick brown fox jumps over the lazy dog.', 'El zorro marrón rápido salta sobre el perro perezoso.', 'I went to see the p. t. barnum circus today!']
Runtime for sentence segmentation 0.08892273902893066
['The quick brown fox jumps over the lazy dog.', 'El zorro marrón rápido salta sobre el perro perezoso.', 'I went to see the p. t. barnum circus today!']
Runtime for sentence segmentation 0.04485893249511719
['The quick brown fox jumps over the lazy dog.', 'El zorro marrón rápido salta sobre el perro perezoso.', 'I went to see the p. t. barnum circus today!']
Runtime for sentence segmentation 0.10665750503540039
['The quick brown fox jumps over the lazy dog.', 'El zorro marrón rápido salta sobre el perro perezoso.', 'I went to see the p. t. barnum circus today!']
Runtime for sentence segmentation 0.05337262153625488
['The quick brown fox jumps over the lazy dog.', 'El zorro marrón rápido salta sobre el perro perezoso.', 'I went to see the p. t. barnum circus today!']
Runtime for sentence segmentation 0.040402889251708984
['The quick brown fox jumps over the lazy dog.', 'El zorro marrón rápido salta sobre el perro perezoso.', 'I went to see the p. t. barnum circus today!']
Runtime for sentence segmentation 0.03861117362976074
['The quick brown fox jumps over the lazy dog.', 'El zorro marrón rápido salta sobre el perro perezoso.', 'I went to see the p. t. barnum circus today!']
Runtime for sentence segmentation 0.04022550582885742
['The quick brown fox jumps over the lazy dog.', 'El zorro marrón rápido salta sobre el perro perezoso.', 'I went to see the p. t. barnum circus today!']
Runtime for sentence segmentation 0.03824734687805176
['The quick brown fox jumps over the lazy dog.', 'El zorro marrón rápido salta sobre el perro perezoso.', 'I went to see the p. t. barnum circus today!']
Runtime for sentence segmentation 0.0443730354309082

Average runtime for sentence segmentation 0.07711464166641235

Whereas if I replace the line wtp = WtP("wtp-bert-mini", ort_providers=["CUDAExecutionProvider"]) with

wtp = WtP("wtp-bert-mini")
wtp.half().to("cuda")

I get

['The quick brown fox jumps over the lazy dog.', 'El zorro marrón rápido salta sobre el perro perezoso.', 'I went to see the p. t. barnum circus today!']
Runtime for sentence segmentation 3.6466240882873535
['The quick brown fox jumps over the lazy dog.', 'El zorro marrón rápido salta sobre el perro perezoso.', 'I went to see the p. t. barnum circus today!']
Runtime for sentence segmentation 0.021060943603515625
['The quick brown fox jumps over the lazy dog.', 'El zorro marrón rápido salta sobre el perro perezoso.', 'I went to see the p. t. barnum circus today!']
Runtime for sentence segmentation 0.014858007431030273
['The quick brown fox jumps over the lazy dog.', 'El zorro marrón rápido salta sobre el perro perezoso.', 'I went to see the p. t. barnum circus today!']
Runtime for sentence segmentation 0.015185832977294922
['The quick brown fox jumps over the lazy dog.', 'El zorro marrón rápido salta sobre el perro perezoso.', 'I went to see the p. t. barnum circus today!']
Runtime for sentence segmentation 0.021528959274291992
['The quick brown fox jumps over the lazy dog.', 'El zorro marrón rápido salta sobre el perro perezoso.', 'I went to see the p. t. barnum circus today!']
Runtime for sentence segmentation 0.014949560165405273
['The quick brown fox jumps over the lazy dog.', 'El zorro marrón rápido salta sobre el perro perezoso.', 'I went to see the p. t. barnum circus today!']
Runtime for sentence segmentation 0.014830350875854492
['The quick brown fox jumps over the lazy dog.', 'El zorro marrón rápido salta sobre el perro perezoso.', 'I went to see the p. t. barnum circus today!']
Runtime for sentence segmentation 0.013895034790039062
['The quick brown fox jumps over the lazy dog.', 'El zorro marrón rápido salta sobre el perro perezoso.', 'I went to see the p. t. barnum circus today!']
Runtime for sentence segmentation 0.013033628463745117
['The quick brown fox jumps over the lazy dog.', 'El zorro marrón rápido salta sobre el perro perezoso.', 'I went to see the p. t. barnum circus today!']
Runtime for sentence segmentation 0.017659902572631836
['The quick brown fox jumps over the lazy dog.', 'El zorro marrón rápido salta sobre el perro perezoso.', 'I went to see the p. t. barnum circus today!']
Runtime for sentence segmentation 0.018916606903076172
['The quick brown fox jumps over the lazy dog.', 'El zorro marrón rápido salta sobre el perro perezoso.', 'I went to see the p. t. barnum circus today!']
Runtime for sentence segmentation 0.013854742050170898
['The quick brown fox jumps over the lazy dog.', 'El zorro marrón rápido salta sobre el perro perezoso.', 'I went to see the p. t. barnum circus today!']
Runtime for sentence segmentation 0.025988340377807617
['The quick brown fox jumps over the lazy dog.', 'El zorro marrón rápido salta sobre el perro perezoso.', 'I went to see the p. t. barnum circus today!']
Runtime for sentence segmentation 0.01674795150756836
['The quick brown fox jumps over the lazy dog.', 'El zorro marrón rápido salta sobre el perro perezoso.', 'I went to see the p. t. barnum circus today!']
Runtime for sentence segmentation 0.015290498733520508
['The quick brown fox jumps over the lazy dog.', 'El zorro marrón rápido salta sobre el perro perezoso.', 'I went to see the p. t. barnum circus today!']
Runtime for sentence segmentation 0.012728214263916016
['The quick brown fox jumps over the lazy dog.', 'El zorro marrón rápido salta sobre el perro perezoso.', 'I went to see the p. t. barnum circus today!']
Runtime for sentence segmentation 0.016968250274658203
['The quick brown fox jumps over the lazy dog.', 'El zorro marrón rápido salta sobre el perro perezoso.', 'I went to see the p. t. barnum circus today!']
Runtime for sentence segmentation 0.01860976219177246
['The quick brown fox jumps over the lazy dog.', 'El zorro marrón rápido salta sobre el perro perezoso.', 'I went to see the p. t. barnum circus today!']
Runtime for sentence segmentation 0.021869421005249023
['The quick brown fox jumps over the lazy dog.', 'El zorro marrón rápido salta sobre el perro perezoso.', 'I went to see the p. t. barnum circus today!']
Runtime for sentence segmentation 0.01503300666809082

Average runtime for sentence segmentation 0.19848165512084961

Although PyTorch implementation is on average slower because outlier from the first run, removing the initial outlier from the first PyTorch run makes it on average faster than ONNX run.

I see the inputs are not bounded to GPU in (https://github.com/bminixhofer/wtpsplit/blob/main/wtpsplit/extract.py). Could you please try to binding them to see if it faster?

segment-any-text / wtpsplit

For GPU, ONNX WtP model is around 2x slower than PyTorch. #106