stenskjaer / samewords

Automatically annotate potentially ambiguous words in critical text editions made with LaTeX and reledmac.
MIT License
7 stars 1 forks source link

strange behaviour on simple files #40

Open floriandk opened 5 years ago

floriandk commented 5 years ago

While trying to annotate a medium size file, I encountered "Error: [Errno 2] No such file or directory" on the web-app. Minimizing the file I've come to

\documentclass{article}
\usepackage[series={A},nofamiliar,noeledsec,noledgroup]{reledmac}

\begin{document}

\beginnumbering
\pstart
word 
\edtext{and}{\Afootnote{C1–6.}}
word and
\pend
\endnumbering 

\end{document}

which should be totally unproblematic but still gives the error. Other minimal files made from scratch do work on the web service.

Several other things are weird with this

\documentclass{article}
\usepackage[series={A},nofamiliar,noeledsec,noledgroup]{reledmac}

\begin{document}

\beginnumbering
\pstart
word 
\edtext{\sameword[1]{and}}{\Afootnote{C1–6.}}
word \sameword{and}
\pend
\endnumbering 

\end{document}
\beginnumbering
\pstart
 Þor%
\edtext{og}{%
    \Afootnote{÷~C\textsuperscript{1–6}.}} %
\pend
\endnumbering 

it will also get processed on the webservice as copy-pasted but not as the original file. But 0.5.0 will complain either way:

Traceback (most recent call last):
  File "somewhere/samewords-master/samewords/tokenize.py", line 522, in _register_closing
    open_idx = self._stack_bracket[-1]
IndexError: list index out of range

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/samewords", line 11, in <module>
    load_entry_point('samewords', 'console_scripts', 'samewords')()
  File "somewhere/samewords-master/samewords/cli.py", line 116, in main
    print(samewords.core.process_document(filename, procedure))
  File "somewhere/samewords-master/samewords/core.py", line 32, in process_document
    for par in chunk_pars(chunk)])
  File "somewhere/samewords-master/samewords/core.py", line 32, in <listcomp>
    for par in chunk_pars(chunk)])
  File "somewhere/samewords-master/samewords/core.py", line 10, in run_annotation
    tokenization = Tokenizer(input_text)
  File "somewhere/samewords-master/samewords/tokenize.py", line 378, in __init__
    self.wordlist = self._wordlist()
  File "somewhere/samewords-master/samewords/tokenize.py", line 388, in _wordlist
    word, pos = self._tokenize(self.data, pos)
  File "somewhere/samewords-master/samewords/tokenize.py", line 497, in _tokenize
    self._register_closing(word)
  File "somewhere/samewords-master/samewords/tokenize.py", line 529, in _register_closing
    word.close_macro(0)
  File "somewhere/samewords-master/samewords/tokenize.py", line 179, in close_macro
    raise IndexError('The word does not have any open macros.')
IndexError: The word does not have any open macros.

This can be remedied for 0.5.0 by either adding a space behind "Þor" or removing the \textsuperscript command. The webservice will still reject the file.

So my guess is that you have take care of this error already in the up-to-date version but I include the description here because I hope it might give you a clue on the main problem:

Could you guide me to which kind of weird properties of my file can break samewords (including the web service) so I can avoid them? I attach the offending file here: VÓ-BSWtest.tex.zip

floriandk commented 5 years ago

The second after I posted it, it occured to me: The "Ó" in the filename is the culprit. I'll leave it here anyway for reference and perhaps to be improved when you happen to have some spare time.

stenskjaer commented 5 years ago

Thank you very much. Looks like there is something to look into here.

But just so that I understand: Does this mean that you got it to work locally and on the webservice when you changed the filename, or only locally, or maybe not at all?

I hope to have a look at the finer details of this on Sunday.

floriandk commented 4 years ago

OK, now I gave this another try…

With my freshly installed samewords 0.5.3 with Python 3.7.5 on MacOS the file I sent you earlier, including the "Ó" in the filename works fine.

The same file will still produce

Processing... Error: [Errno 2] No such file or directory: '/tmp/76022ca8-4b8a-4a00-acba-4ab6d140cec7.tex'

on your web-service (on Chrome).

So there's probably really something about the filename and special chars.

My initial complete file that started the whole enquiry still doesn't get compiled neither on- nor offline, with or without special characters. I will try to build a new minimal example for this part of the problem soon.

floriandk commented 4 years ago

For the examples mentioned here 0.5.3 works as expected.

I tend to say that this puzzling behaviour had it's roots in the web-service's problem with some filename chars, a spacing problem in 0.5.0 that doesn't occur with 0.5.3 and the empty braces issue affecting the same file in my various attempts which I wasn't really sort one from the other before now and therefore submitting this confusing report.

stenskjaer commented 4 years ago

Thank you for following up on this. I will have some time over Christmas to dig into these different issues, and clearly this isn't working exactly as expected, so I'll see if I can find out what's going on and get it working in the package as well as the online service.

stenskjaer commented 4 years ago

To sum up here, just so I'm sure I understand this ocrrectly:

  1. There are problems with the online service at the moment, as it is not automatically updated when new versions of the package are released.
  2. 0.5.3 and above gives some errors for what looks like a bug in how spacing was handled earlier.
  3. The minimal example given above works correctly on version 0.5.3 (and above as I have tested it on the HEAD of master) without errors.
  4. There is still some problems with empty brackets that is addressed in issue #43.

This means:

Let me know if I have any misunderstandings here.