segment-any-text / wtpsplit

Toolkit to segment text into sentences or other semantic units in a robust, efficient and adaptable way.
MIT License
715 stars 41 forks source link

show progress #6

Closed MiniXC closed 3 years ago

MiniXC commented 4 years ago

I'm currently using nnsplit on a fairly big dataset. Is it possible to track progress on a long list of inputs?

bminixhofer commented 3 years ago

As of v0.5.2 this is now supported via a verbose argument to .split(..), but only for the Python bindings. Like this:

(py38) bminixhofer@pop-os:~/Documents$ ipython
Python 3.8.5 (default, Sep  4 2020, 07:30:14) 
Type 'copyright', 'credits' or 'license' for more information
IPython 7.18.1 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import nnsplit

In [2]: splitter = nnsplit.NNSplit.load("en")

In [3]: _ = splitter.split(["Hello!"] * 100_000, verbose=True)
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100000/100000 [00:00<00:00, 221552.76it/s]

In [4]: