stenskjaer / samewords

Automatically annotate potentially ambiguous words in critical text editions made with LaTeX and reledmac.
MIT License
7 stars 1 forks source link

macros with empty argument/braces #43

Open floriandk opened 5 years ago

floriandk commented 5 years ago

Macros with empty arguments/braces without space immediately after seem to puzzle samewords. E.g.:

\beginnumbering
\pstart
{word\anymacro{}}
\pend
\endnumbering 

or

{word\anymacro{}}a

gives

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/samewords/tokenize.py", line 538, in _register_closing
    open_idx = self._stack_bracket[-1]
IndexError: list index out of range

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/samewords", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.7/site-packages/samewords/cli.py", line 116, in main
    print(samewords.core.process_document(filename, procedure))
  File "/usr/local/lib/python3.7/site-packages/samewords/core.py", line 26, in process_document
    return process_string(content, method=method)
  File "/usr/local/lib/python3.7/site-packages/samewords/core.py", line 38, in process_string
    for par in chunk_pars(chunk)])
  File "/usr/local/lib/python3.7/site-packages/samewords/core.py", line 38, in <listcomp>
    for par in chunk_pars(chunk)])
  File "/usr/local/lib/python3.7/site-packages/samewords/core.py", line 10, in run_annotation
    tokenization = Tokenizer(input_text)
  File "/usr/local/lib/python3.7/site-packages/samewords/tokenize.py", line 390, in __init__
    self.wordlist = self._wordlist()
  File "/usr/local/lib/python3.7/site-packages/samewords/tokenize.py", line 400, in _wordlist
    word, pos = self._tokenize(self.data, pos)
  File "/usr/local/lib/python3.7/site-packages/samewords/tokenize.py", line 513, in _tokenize
    self._register_closing(word)
  File "/usr/local/lib/python3.7/site-packages/samewords/tokenize.py", line 545, in _register_closing
    word.close_macro(0)
  File "/usr/local/lib/python3.7/site-packages/samewords/tokenize.py", line 191, in close_macro
    'The word "{}" does not have any open macros.'.format(self))
IndexError: The word "word" does not have any open macros.

But either

{word\anymacro }

or

{word\anymacro{} }

or

word\anymacro{}

work fine.

stenskjaer commented 4 years ago

I have a good idea what is going on here, and it might be trivial, or it might be a bit more convoluted, but I'll look into this over Christmas.

stenskjaer commented 4 years ago

Can you check if you still run into the described problem with the version on branch issue-43-macros-with-empty-arguments?