stenskjaer / samewords

Automatically annotate potentially ambiguous words in critical text editions made with LaTeX and reledmac.
MIT License
7 stars 1 forks source link

overlapping structures with xxref #32

Open floriandk opened 6 years ago

floriandk commented 6 years ago

Overlapping structures with \xxref confuse the script:

\documentclass{scrartcl}

\usepackage[series={A},nofamiliar,noeledsec,noledgroup,draft]{reledmac}

\begin{document}

\beginnumbering
\pstart
One %
\edtext{and two \edtext{}{\xxref{and-and-start}{and-and-end}\lemma{and–and}\Afootnote{overlapping}}\edlabel{and-and-start}and %
\edtext{three}{%
    \Afootnote{tree}}
 and four and one and two and three %
\edtext{and}{%
    \Afootnote{or}}
 four}{
    \lemma{and–four}
    \Afootnote{del.}}
 and\edlabel{and-and-end} six.
\pend
\endnumbering

\end{document}

->

\documentclass{scrartcl}

\usepackage[series={A},nofamiliar,noeledsec,noledgroup,draft]{reledmac}

\begin{document}

\beginnumbering
\pstart
One %
\edtext{\sameword{and} two \edtext{\sameword[2]{}}{\xxref{and-and-start}{and-and-end}\lemma{\sameword{and}–\sameword{and}}\Afootnote{overlapping}}\edlabel{and-and-start}and %
\edtext{\sameword[2]{three}}{%
    \Afootnote{tree}}
 \sameword{and} four \sameword{and} one \sameword{and} two \sameword{and} \sameword{three} %
\edtext{\sameword[2]{and}}{%
    \Afootnote{or}}
 four}{
    \lemma{and–four}
    \Afootnote{del.}}
 and\edlabel{and-and-end} six.
\pend
\endnumbering

\end{document}

which compiles to

1 and–four] del. 1 and¹–and] overlapping 1 three¹ ] tree 1 and⁶ ] or

Or (which I understand to be alternative correct usage of \xxref)

\documentclass{scrartcl}

\usepackage[series={A},nofamiliar,noeledsec,noledgroup,draft]{reledmac}

\begin{document}

\beginnumbering
\pstart
One %
\edtext{and two \edtext{and}{\xxref{and-and-start}{and-and-end}\lemma{and–and}\Afootnote{overlapping}}\edlabel{and-and-start} %
\edtext{three}{%
    \Afootnote{tree}}
 and four and one and two and three %
\edtext{and}{%
    \Afootnote{or}}
 four}{
    \lemma{and–four}
    \Afootnote{del.}}
 and\edlabel{and-and-end} six.
\pend
\endnumbering

\end{document}

->

\documentclass{scrartcl}

\usepackage[series={A},nofamiliar,noeledsec,noledgroup,draft]{reledmac}

\begin{document}

\beginnumbering
\pstart
One %
\edtext{\sameword{and} two \edtext{\sameword[2]{and}}{\xxref{and-and-start}{and-and-end}\lemma{\sameword{and}–\sameword{and}}\Afootnote{overlapping}}\edlabel{and-and-start} %
\edtext{\sameword[2]{three}}{%
    \Afootnote{tree}}
 \sameword{and} four \sameword{and} one \sameword{and} two \sameword{and} \sameword{three} %
\edtext{\sameword[2]{and}}{%
    \Afootnote{or}}
 four}{
    \lemma{and–four}
    \Afootnote{del.}}
 and\edlabel{and-and-end} six.
\pend
\endnumbering

\end{document}

->

1 and–four] del. 1 and²–and] overlapping 1 three¹ ] tree 1 and⁷ ] or

As far as I understand reledmac's handling of \sameword it isn't possible to mark up the overlapping structure to be numbered automatically (is it?) but at least the applying of regular \sameword-tags shouldn't be broken.

floriandk commented 6 years ago

in combination with other \edtext nearby (I haven't yet found the exact trigger) the script even breaks off. E.g.:

\documentclass{scrartcl}

\usepackage[series={A},nofamiliar,noeledsec,noledgroup,draft]{reledmac}

\begin{document}

\beginnumbering
\pstart
some text 
\edtext{}%
    {\xxref{start}{end}\lemma{and–text}%
    \Afootnote{xxrefnote}}%
        \edlabel{start}\edtext{and}{\Afootnote{or}}
        \edlabel{end}\edtext{text}{\Afootnote{letters}}
more text
\pend
\endnumbering

\end{document}
Traceback (most recent call last):
  File "/sameword-test/samewords-issue-29/samewords/tokenize.py", line 477, in _tokenize
    open_idx = self._stack_bracket[-1]
IndexError: list index out of range

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/samewords", line 11, in <module>
    load_entry_point('samewords', 'console_scripts', 'samewords')()
  File "/sameword-test/samewords-issue-29/samewords/cli.py", line 107, in main
    print(samewords.core.process_document(filename, procedure))
  File "/sameword-test/samewords-issue-29/samewords/core.py", line 32, in process_document
    for par in chunk_pars(chunk)])
  File "/sameword-test/samewords-issue-29/samewords/core.py", line 32, in <listcomp>
    for par in chunk_pars(chunk)])
  File "/sameword-test/samewords-issue-29/samewords/core.py", line 10, in run_annotation
    tokenization = Tokenizer(input_text)
  File "/sameword-test/samewords-issue-29/samewords/tokenize.py", line 357, in __init__
    self.wordlist = self._wordlist()
  File "/sameword-test/samewords-issue-29/samewords/tokenize.py", line 367, in _wordlist
    word, pos = self._tokenize(self.data, pos)
  File "/sameword-test/samewords-issue-29/samewords/tokenize.py", line 484, in _tokenize
    word.close_macro(0)
  File "/sameword-test/samewords-issue-29/samewords/tokenize.py", line 174, in close_macro
    raise IndexError('The word does not have any open macros.')
IndexError: The word does not have any open macros.
stenskjaer commented 6 years ago

I just want to clarify the first part that you mention: There is no possible way of numbering samewords in the case of overlapping apparatus notes? Is that correctly understood? That can't be good.

Anyway. I am moving towards annotating the words with the \edlabel{}s correctly and raising a warning for the user when an empty \edtext{}{} is given (which is then not annotated, unlike how it is now). I also want to update the docs to warn about the first solution you give, with the empty \edtext{}{}, as that will result in an incorrect numbering of the samewords.

If the second solution you suggest is used (where \edtext{}{} does have the content, it can count the "and"s correctly for other app notes, but the overlapping note is still not numbered.

But before I push some suggestions for documentation, warning and a partial solution, I just want to make sure I understand this correctly. Overlapping apparatuses cannot be disambiguated?

floriandk commented 6 years ago

I just want to clarify the first part that you mention: There is no possible way of numbering samewords in the case of overlapping apparatus notes? Is that correctly understood? That can't be good.

This is how I understand it and I am not too happy about it either. But perhaps Maïeul could confirm that we read this correctly?

Actually the whole \xxref-mechanism is more cumbersome to use than I'd like it to be anyway, but I can't see any way how it would be possible to code overlaps in TeX without some sort of pointers/labels.

The good thing is that even a huge apparatus -- at least as far as my experience goes -- will usually have very few overlaps anyway: It seems that the variants most often end up being orderly nested even if the editor isn't constrained by the structure of TeX. (As a sidenote: I would be interested to find out whether there is something inherent in textual variance that does this or whether it is the process of identifying and structuring variants, guided by the tradition of textual edition. I'd lean towards the latter, but this is just a guess.)

So there'll usually only be a few occurrences of \xxref to track down when everything else is in place and add superscript numbers to the content of \lemma manually where necessary.

Anyway. I am moving towards annotating the words with the \edlabel{}s correctly and raising a warning for the user when an empty \edtext{}{} is given (which is then not annotated, unlike how it is now). I also want to update the docs to warn about the first solution you give, with the empty \edtext{}{}, as that will result in an incorrect numbering of the samewords.

This sounds sensible to me. Though I don't really understand why the first solution has to give incorrect numbering -- but I have to confess that I have difficulties wrapping my head around this problem.

If the second solution you suggest is used (where \edtext{}{} does have the content, it can count the "and"s correctly for other app notes, but the overlapping note is still not numbered.

Perhaps the script could list the labels used by \xxref together with the warning for us lazy users. If one uses the second solution, only the second of the label actually needs manual work -- do I understand this correctly?

But before I push some suggestions for documentation, warning and a partial solution, I just want to make sure I understand this correctly. Overlapping apparatuses cannot be disambiguated?

Would you mind helping us out here, @maieul ?

maieul commented 6 years ago

Sorry, I don't understand exactly the problem i should answer.

maieul commented 6 years ago

@floriandk please open an issue with overlapping edtext, and explain me the problem in reledmac repository.

stenskjaer commented 6 years ago

@maieul: My question is just: Is it true that there is no way to automatically number ambiguous terms with \sameword{} when an apparatus entry is made with the \xxref{} and \edlabel{}?

maieul commented 6 years ago

the answer is : not yet. Maybe in the future, but it is complex. See

https://github.com/maieul/ledmac/issues/768

stenskjaer commented 6 years ago

Good. I will keep this issue open for now and see how those ideas develop.