Closed thegenerativegeneration closed 5 months ago
Hi, thanks for catching this! There was a small issue with the tokenizer. We fixed it with wtpsplit==2.0.4
; please upgrade.
With this, I get (both 1L):
SaT:
%timeit sat.split(SENTENCE * 100)
801 ms ± 351 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
SaT + GPU:
%timeit sat.split(SENTENCE * 100)
65.5 ms ± 1.55 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
WtP:
%timeit wtp.split(SENTENCE * 100)
6.08 s ± 1.49 s per loop (mean ± std. dev. of 7 runs, 1 loop each)
WtP + GPU:
%timeit wtp.split(SENTENCE * 100)
370 ms ± 9.49 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
(Note: For very short sequences and small models, it may still be that WtP is slightly faster. But you should absolutely not use WtP with short sequences regardless since others have reported problematic inconsistencies and we also show its poor performance in our paper.)
Thank you very much! It works a lot faster now than WtP did - even for very short texts. And the separation is much more natural.
Hi, I tried using SaT as a drop in for WtP (wtp-canine-s-1l-no-adapters).
However no matter which variant between 1l and 3l I try, it always takes nearly a second to run inference (vs 0.013s (cpu) and 0.005s (gpu) with wtp). There is no difference between CPU and GPU in the SaT runtime for me.
System: docker FROM nvidia/cuda:11.8.0-cudnn8-runtime-ubuntu22.04 Python 3.10.12
Pip