Closed mollerhoj closed 4 years ago
File "/home/mollerhoj/.local/lib/python3.5/site-packages/pysbd/segmenter.py", line 42, in segment
segments = processor.process()
File "/home/mollerhoj/.local/lib/python3.5/site-packages/pysbd/processor.py", line 44, in process
self.text = AbbreviationReplacer(self.text).replace()
File "/home/mollerhoj/.local/lib/python3.5/site-packages/pysbd/abbreviation_replacer.py", line 61, in replace
self.text = self.search_for_abbreviations_in_string()
File "/home/mollerhoj/.local/lib/python3.5/site-packages/pysbd/abbreviation_replacer.py", line 96, in search_for_abbreviations_in_string
self.text, match, ind, char_array
File "/home/mollerhoj/.local/lib/python3.5/site-packages/pysbd/abbreviation_replacer.py", line 114, in scan_for_replacements
txt = replace_period_of_abbr(txt, am)
File "/home/mollerhoj/.local/lib/python3.5/site-packages/pysbd/abbreviation_replacer.py", line 36, in replace_period_of_abbr
txt,
File "/usr/lib/python3.5/re.py", line 182, in sub
return _compile(pattern, flags).sub(repl, string, count)
File "/usr/lib/python3.5/re.py", line 293, in _compile
p = sre_compile.compile(pattern, flags)
File "/usr/lib/python3.5/sre_compile.py", line 536, in compile
p = sre_parse.parse(p, flags)
File "/usr/lib/python3.5/sre_parse.py", line 834, in parse
raise source.error("unbalanced parenthesis")
File "/home/mollerhoj/.local/lib/python3.5/site-packages/pysbd/abbreviation_replacer.py", line 61, in replace
self.text = self.search_for_abbreviations_in_string()
File "/home/mollerhoj/.local/lib/python3.5/site-packages/pysbd/abbreviation_replacer.py", line 96, in search_for_abbreviations_in_string
self.text, match, ind, char_array
File "/home/mollerhoj/.local/lib/python3.5/site-packages/pysbd/abbreviation_replacer.py", line 114, in scan_for_replacements
txt = replace_period_of_abbr(txt, am)
File "/home/mollerhoj/.local/lib/python3.5/site-packages/pysbd/abbreviation_replacer.py", line 36, in replace_period_of_abbr
txt,
File "/usr/lib/python3.5/re.py", line 182, in sub
return _compile(pattern, flags).sub(repl, string, count)
File "/usr/lib/python3.5/re.py", line 293, in _compile
p = sre_compile.compile(pattern, flags)
File "/usr/lib/python3.5/sre_compile.py", line 536, in compile
p = sre_parse.parse(p, flags)
File "/usr/lib/python3.5/sre_parse.py", line 829, in parse
p = _parse_sub(source, pattern, 0)
File "/usr/lib/python3.5/sre_parse.py", line 437, in _parse_sub
itemsappend(_parse(source, state))
File "/usr/lib/python3.5/sre_parse.py", line 722, in _parse
source.tell() - start)
sre_constants.error: missing ), unterminated subpattern at position 0
@mollerhoj If you can provide an example that would be helpful to debug the issue. I most likely need to use re.escape
in replace_period_of_abbr
function for those kinds of edge cases
Closing. Feel free to open with more info
I'm getting errors because the regexp engine interprets parentesis: "unterminated subpattern" and "unbalanced parenthesis".
I'm analysing very large amounts of text, so not sure how these were triggered.