rhasspy / gruut

A tokenizer, text cleaner, and phonemizer for many human languages.
MIT License
273 stars 36 forks source link

Bug when the word 'nan' is in the sentence #17

Closed WeberJulian closed 2 years ago

WeberJulian commented 2 years ago

Hey,

I'm using Gruut 2.0.3 with coqui-TTS in a conda env and python 3.8.10 Weird behavior when the word 'nan' is in the sentence:

>>> list(gruut.sentences('nan', lang='en', espeak=False))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/julian/miniconda3/envs/TTS/lib/python3.8/site-packages/gruut/__init__.py", line 79, in sentences
    graph, root = text_processor(text, lang=lang, ssml=ssml, **process_args)
  File "/home/julian/miniconda3/envs/TTS/lib/python3.8/site-packages/gruut/text_processor.py", line 432, in __call__
    return self.process(*args, **kwargs)
  File "/home/julian/miniconda3/envs/TTS/lib/python3.8/site-packages/gruut/text_processor.py", line 884, in process
    if pipeline_transform(self._transform_number, graph, root):
  File "/home/julian/miniconda3/envs/TTS/lib/python3.8/site-packages/gruut/utils.py", line 305, in pipeline_transform
    if transform_func(graph, leaf_node):
  File "/home/julian/miniconda3/envs/TTS/lib/python3.8/site-packages/gruut/text_processor.py", line 1657, in _transform_number
    if (1000 < number < 3000) and (re.match(r"^\d+$", word.text) is not None):
decimal.InvalidOperation: [<class 'decimal.InvalidOperation'>]
>>> list(gruut.sentences('hello', lang='en', espeak=False))
[Sentence(idx=0, text='hello', text_with_ws='hello', text_spoken='hello', par_idx=0, lang='en', voice='', words=[Word(idx=0, text='hello', text_with_ws='hello', leading_ws='', trailing_ws='', sent_idx=0, par_idx=0, lang='en', voice='', pos='UH', phonemes=['h', 'ɛ', 'l', 'ˈoʊ'], is_major_break=False, is_minor_break=False, is_punctuation=False, is_break=False, is_spoken=True, pause_before_ms=0, pause_after_ms=0, marks_before=None, marks_after=None)], pause_before_ms=0, pause_after_ms=0, marks_before=None, marks_after=None)]

It seems that it is detected as a number haha (or rather Not A Number)

synesthesiam commented 2 years ago

Ha, thanks for catching this! I've pushed gruut 2.0.4 with a fix. Numbers that are non-finite (inf/nan) are not parsed anymore.

WeberJulian commented 2 years ago

Thanks for the quick fix!