the-paperless-project / paperless

Scan, index, and archive all of your paper documents
GNU General Public License v3.0
7.84k stars 501 forks source link

Consumer dies with exception when using special characters in correspondent detection #689

Open texel-sensei opened 3 years ago

texel-sensei commented 3 years ago

I had the paperless-consumer die, with an exception in the re module.

It happened because I had set up an Correspondent any tag, which contained a + sign (string was something like "foo + bar"). Removing the + did resolve the error.

Traceback
Traceback (most recent call last):
  File "./manage.py", line 11, in 
    execute_from_command_line(sys.argv)
  File "/home/paperless/venv/lib/python3.8/site-packages/django/core/management/__init__.py", line 371, in execute_from_command_line
    utility.execute()
  File "/home/paperless/venv/lib/python3.8/site-packages/django/core/management/__init__.py", line 365, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/home/paperless/venv/lib/python3.8/site-packages/django/core/management/base.py", line 288, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/home/paperless/venv/lib/python3.8/site-packages/django/core/management/base.py", line 335, in execute
    output = self.handle(*args, **options)
  File "/home/paperless/paperless/src/documents/management/commands/document_consumer.py", line 99, in handle
    self.loop(loop_time, mail_delta)
  File "/home/paperless/paperless/src/documents/management/commands/document_consumer.py", line 108, in loop
    self.loop_step(mail_delta, start_time)
  File "/home/paperless/paperless/src/documents/management/commands/document_consumer.py", line 122, in loop_step
    self.file_consumer.consume_new_files()
  File "/home/paperless/paperless/src/documents/consumer.py", line 117, in consume_new_files
    if not self.try_consume_file(file):
  File "/usr/lib/python3.8/contextlib.py", line 75, in inner
    return func(*args, **kwds)
  File "/home/paperless/paperless/src/documents/consumer.py", line 178, in try_consume_file
    document_consumption_finished.send(
  File "/home/paperless/venv/lib/python3.8/site-packages/django/dispatch/dispatcher.py", line 176, in send
    return [
  File "/home/paperless/venv/lib/python3.8/site-packages/django/dispatch/dispatcher.py", line 177, in 
    (receiver, receiver(signal=self, sender=sender, **named))
  File "/home/paperless/paperless/src/documents/signals/handlers.py", line 25, in set_correspondent
    potential_correspondents = list(Correspondent.match_all(document.content))
  File "/home/paperless/paperless/src/documents/models.py", line 86, in match_all
    if tag.matches(text):
  File "/home/paperless/paperless/src/documents/models.py", line 110, in matches
    if re.search(r"\b{}\b".format(word), text, **search_kwargs):
  File "/usr/lib/python3.8/re.py", line 201, in search
    return _compile(pattern, flags).search(string)
  File "/usr/lib/python3.8/re.py", line 304, in _compile
    p = sre_compile.compile(pattern, flags)
  File "/usr/lib/python3.8/sre_compile.py", line 764, in compile
    p = sre_parse.parse(p, flags)
  File "/usr/lib/python3.8/sre_parse.py", line 948, in parse
    p = _parse_sub(source, state, flags & SRE_FLAG_VERBOSE, 0)
  File "/usr/lib/python3.8/sre_parse.py", line 443, in _parse_sub
    itemsappend(_parse(source, state, verbose, nested + 1,
  File "/usr/lib/python3.8/sre_parse.py", line 671, in _parse
    raise source.error("multiple repeat",
re.error: multiple repeat at position 18