segment-any-text / wtpsplit

Toolkit to segment text into sentences or other semantic units in a robust, efficient and adaptable way.
MIT License
695 stars 39 forks source link

For GPU, ONNX WtP model is around 2x slower than PyTorch. #106

Closed Phuoc-Hoan-Le closed 1 year ago

Phuoc-Hoan-Le commented 1 year ago
import time
from wtpsplit import WtP

wtp = WtP("wtp-bert-mini", ort_providers=["CUDAExecutionProvider"])

def make_sentence(seg):
  sentences = wtp.split(seg, lang_code="en", style="ud", threshold=0.975)
  sentences = [x.strip() for x in sentences]
  return(sentences)

timelist_fox = []

for i in range(20):
  start = time.time()
  input_text = "The quick brown fox jumps over the lazy dog. El zorro marrón rápido salta sobre el perro perezoso. I went to see the p. t. barnum circus today!"
  sentences = make_sentence(input_text)
  end = time.time()
  print(sentences)
  print("Runtime for sentence segmentation", end - start)
  timelist_fox.append(end - start)

print()
# Get average runtime
print("Average runtime for sentence segmentation", sum(timelist_fox)/len(timelist_fox))

And I get

['The quick brown fox jumps over the lazy dog.', 'El zorro marrón rápido salta sobre el perro perezoso.', 'I went to see the p. t. barnum circus today!']
Runtime for sentence segmentation 0.09200406074523926
['The quick brown fox jumps over the lazy dog.', 'El zorro marrón rápido salta sobre el perro perezoso.', 'I went to see the p. t. barnum circus today!']
Runtime for sentence segmentation 0.15698647499084473
['The quick brown fox jumps over the lazy dog.', 'El zorro marrón rápido salta sobre el perro perezoso.', 'I went to see the p. t. barnum circus today!']
Runtime for sentence segmentation 0.07426166534423828
['The quick brown fox jumps over the lazy dog.', 'El zorro marrón rápido salta sobre el perro perezoso.', 'I went to see the p. t. barnum circus today!']
Runtime for sentence segmentation 0.07866954803466797
['The quick brown fox jumps over the lazy dog.', 'El zorro marrón rápido salta sobre el perro perezoso.', 'I went to see the p. t. barnum circus today!']
Runtime for sentence segmentation 0.09979438781738281
['The quick brown fox jumps over the lazy dog.', 'El zorro marrón rápido salta sobre el perro perezoso.', 'I went to see the p. t. barnum circus today!']
Runtime for sentence segmentation 0.08975934982299805
['The quick brown fox jumps over the lazy dog.', 'El zorro marrón rápido salta sobre el perro perezoso.', 'I went to see the p. t. barnum circus today!']
Runtime for sentence segmentation 0.09622359275817871
['The quick brown fox jumps over the lazy dog.', 'El zorro marrón rápido salta sobre el perro perezoso.', 'I went to see the p. t. barnum circus today!']
Runtime for sentence segmentation 0.09634947776794434
['The quick brown fox jumps over the lazy dog.', 'El zorro marrón rápido salta sobre el perro perezoso.', 'I went to see the p. t. barnum circus today!']
Runtime for sentence segmentation 0.07365036010742188
['The quick brown fox jumps over the lazy dog.', 'El zorro marrón rápido salta sobre el perro perezoso.', 'I went to see the p. t. barnum circus today!']
Runtime for sentence segmentation 0.10837149620056152
['The quick brown fox jumps over the lazy dog.', 'El zorro marrón rápido salta sobre el perro perezoso.', 'I went to see the p. t. barnum circus today!']
Runtime for sentence segmentation 0.0805506706237793
['The quick brown fox jumps over the lazy dog.', 'El zorro marrón rápido salta sobre el perro perezoso.', 'I went to see the p. t. barnum circus today!']
Runtime for sentence segmentation 0.08892273902893066
['The quick brown fox jumps over the lazy dog.', 'El zorro marrón rápido salta sobre el perro perezoso.', 'I went to see the p. t. barnum circus today!']
Runtime for sentence segmentation 0.04485893249511719
['The quick brown fox jumps over the lazy dog.', 'El zorro marrón rápido salta sobre el perro perezoso.', 'I went to see the p. t. barnum circus today!']
Runtime for sentence segmentation 0.10665750503540039
['The quick brown fox jumps over the lazy dog.', 'El zorro marrón rápido salta sobre el perro perezoso.', 'I went to see the p. t. barnum circus today!']
Runtime for sentence segmentation 0.05337262153625488
['The quick brown fox jumps over the lazy dog.', 'El zorro marrón rápido salta sobre el perro perezoso.', 'I went to see the p. t. barnum circus today!']
Runtime for sentence segmentation 0.040402889251708984
['The quick brown fox jumps over the lazy dog.', 'El zorro marrón rápido salta sobre el perro perezoso.', 'I went to see the p. t. barnum circus today!']
Runtime for sentence segmentation 0.03861117362976074
['The quick brown fox jumps over the lazy dog.', 'El zorro marrón rápido salta sobre el perro perezoso.', 'I went to see the p. t. barnum circus today!']
Runtime for sentence segmentation 0.04022550582885742
['The quick brown fox jumps over the lazy dog.', 'El zorro marrón rápido salta sobre el perro perezoso.', 'I went to see the p. t. barnum circus today!']
Runtime for sentence segmentation 0.03824734687805176
['The quick brown fox jumps over the lazy dog.', 'El zorro marrón rápido salta sobre el perro perezoso.', 'I went to see the p. t. barnum circus today!']
Runtime for sentence segmentation 0.0443730354309082

Average runtime for sentence segmentation 0.07711464166641235

Whereas if I replace the line wtp = WtP("wtp-bert-mini", ort_providers=["CUDAExecutionProvider"]) with

wtp = WtP("wtp-bert-mini")
wtp.half().to("cuda")

I get

['The quick brown fox jumps over the lazy dog.', 'El zorro marrón rápido salta sobre el perro perezoso.', 'I went to see the p. t. barnum circus today!']
Runtime for sentence segmentation 3.6466240882873535
['The quick brown fox jumps over the lazy dog.', 'El zorro marrón rápido salta sobre el perro perezoso.', 'I went to see the p. t. barnum circus today!']
Runtime for sentence segmentation 0.021060943603515625
['The quick brown fox jumps over the lazy dog.', 'El zorro marrón rápido salta sobre el perro perezoso.', 'I went to see the p. t. barnum circus today!']
Runtime for sentence segmentation 0.014858007431030273
['The quick brown fox jumps over the lazy dog.', 'El zorro marrón rápido salta sobre el perro perezoso.', 'I went to see the p. t. barnum circus today!']
Runtime for sentence segmentation 0.015185832977294922
['The quick brown fox jumps over the lazy dog.', 'El zorro marrón rápido salta sobre el perro perezoso.', 'I went to see the p. t. barnum circus today!']
Runtime for sentence segmentation 0.021528959274291992
['The quick brown fox jumps over the lazy dog.', 'El zorro marrón rápido salta sobre el perro perezoso.', 'I went to see the p. t. barnum circus today!']
Runtime for sentence segmentation 0.014949560165405273
['The quick brown fox jumps over the lazy dog.', 'El zorro marrón rápido salta sobre el perro perezoso.', 'I went to see the p. t. barnum circus today!']
Runtime for sentence segmentation 0.014830350875854492
['The quick brown fox jumps over the lazy dog.', 'El zorro marrón rápido salta sobre el perro perezoso.', 'I went to see the p. t. barnum circus today!']
Runtime for sentence segmentation 0.013895034790039062
['The quick brown fox jumps over the lazy dog.', 'El zorro marrón rápido salta sobre el perro perezoso.', 'I went to see the p. t. barnum circus today!']
Runtime for sentence segmentation 0.013033628463745117
['The quick brown fox jumps over the lazy dog.', 'El zorro marrón rápido salta sobre el perro perezoso.', 'I went to see the p. t. barnum circus today!']
Runtime for sentence segmentation 0.017659902572631836
['The quick brown fox jumps over the lazy dog.', 'El zorro marrón rápido salta sobre el perro perezoso.', 'I went to see the p. t. barnum circus today!']
Runtime for sentence segmentation 0.018916606903076172
['The quick brown fox jumps over the lazy dog.', 'El zorro marrón rápido salta sobre el perro perezoso.', 'I went to see the p. t. barnum circus today!']
Runtime for sentence segmentation 0.013854742050170898
['The quick brown fox jumps over the lazy dog.', 'El zorro marrón rápido salta sobre el perro perezoso.', 'I went to see the p. t. barnum circus today!']
Runtime for sentence segmentation 0.025988340377807617
['The quick brown fox jumps over the lazy dog.', 'El zorro marrón rápido salta sobre el perro perezoso.', 'I went to see the p. t. barnum circus today!']
Runtime for sentence segmentation 0.01674795150756836
['The quick brown fox jumps over the lazy dog.', 'El zorro marrón rápido salta sobre el perro perezoso.', 'I went to see the p. t. barnum circus today!']
Runtime for sentence segmentation 0.015290498733520508
['The quick brown fox jumps over the lazy dog.', 'El zorro marrón rápido salta sobre el perro perezoso.', 'I went to see the p. t. barnum circus today!']
Runtime for sentence segmentation 0.012728214263916016
['The quick brown fox jumps over the lazy dog.', 'El zorro marrón rápido salta sobre el perro perezoso.', 'I went to see the p. t. barnum circus today!']
Runtime for sentence segmentation 0.016968250274658203
['The quick brown fox jumps over the lazy dog.', 'El zorro marrón rápido salta sobre el perro perezoso.', 'I went to see the p. t. barnum circus today!']
Runtime for sentence segmentation 0.01860976219177246
['The quick brown fox jumps over the lazy dog.', 'El zorro marrón rápido salta sobre el perro perezoso.', 'I went to see the p. t. barnum circus today!']
Runtime for sentence segmentation 0.021869421005249023
['The quick brown fox jumps over the lazy dog.', 'El zorro marrón rápido salta sobre el perro perezoso.', 'I went to see the p. t. barnum circus today!']
Runtime for sentence segmentation 0.01503300666809082

Average runtime for sentence segmentation 0.19848165512084961

Although PyTorch implementation is on average slower because outlier from the first run, removing the initial outlier from the first PyTorch run makes it on average faster than ONNX run.

I see the inputs are not bounded to GPU in (https://github.com/bminixhofer/wtpsplit/blob/main/wtpsplit/extract.py). Could you please try to binding them to see if it faster?

bminixhofer commented 1 year ago

Interesting, I guess there is some variability across GPUs / setups there.

What exactly do you mean by "bounded to GPU". Can you send a code snippet or PR?

Phuoc-Hoan-Le commented 1 year ago

The code example snippet can be found in the link, https://onnxruntime.ai/docs/api/python/api_summary.html#data-on-device , where it gives you example on how to bind your inputs/outputs to the GPU.