segment-any-text / wtpsplit

Toolkit to segment text into sentences or other semantic units in a robust, efficient and adaptable way.
MIT License
624 stars 36 forks source link

get_threshold does not work #91

Closed rggdmonk closed 1 year ago

rggdmonk commented 1 year ago

Hi! I'm tring to test functionality from README.md this step


from wtpsplit import WtP

wtp = WtP("wtp-canine-s-12l")

wtp.get_threshold("en", "ud")
AttributeError                            Traceback (most recent call last)

<ipython-input-41-b7dd80e9f417> in <cell line: 1>()
----> 1 wtp.get_threshold("en", "ud")

1 frames

/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py in __getattr__(self, name)
   1612             if name in modules:
   1613                 return modules[name]
-> 1614         raise AttributeError("'{}' object has no attribute '{}'".format(
   1615             type(self).__name__, name))
   1616 

AttributeError: 'LACanineForTokenClassification' object has no attribute 'get_threshold'

Colab: torch 2.0.1+cu118 huggingface-hub-0.15.1 safetensors-0.3.1 skops-0.7.post0 tokenizers-0.13.3 transformers-4.30.2 wtpsplit-1.0.1

bminixhofer commented 1 year ago

Oops, sorry, you're running into some of the early-stage issues in the revamp.

This is fixed in version 1.1.0, and you can now also get the default threshold used in the punctuation adapation via

get_threshold("en", "ud", return_punctuation_threshold=True)