latex3 / tagpdf

Tagging support code for LaTeX
59 stars 7 forks source link

Tagging endnotes with postnotes #68

Closed gusbrs closed 1 year ago

gusbrs commented 1 year ago

Hi @u-fischer , following discussion at TeX.SX, I promised to send you some examples of postnotes documents for the purpose of discussing the possibility of providing tagging support for it.

As you mentioned in the chat, the basic problem is to be able to connect the mark where the note is placed in the text with where the note is actually printed. For that, a very simple document would be enough to provide the context in which the problem occurs.

\documentclass{book}

\usepackage{postnotes}
\usepackage{lipsum}
\usepackage{hyperref}

\begin{document}

\chapter{Chapter 1}

\lipsum[1]\postnote{\lipsum[2]}

\lipsum[3]\postnote{\lipsum[4]}

\printpostnotes

\end{document}

The split between the mark the typeset note at the end provides some challenges of its own and in a number of cases it is desirable to typeset the note with at least part of the "context" of the mark in place. For this reason, postnotes has a couple of hooks to provide for such a need. postnotes/store/note is called when the note is stored, while postnotes/print/eachnote is called when the note is printed. I think they could be used for transferring the relevant information for tagging too. (Note that I've just renamed postnotes/store/note, to postnotes/note/store and have not yet released, it will change, but what you see is the first). If they are not sufficient, I can provide necessary adjustments. Currently, at postnotes/store/note the current note ID can be retrieved by \l__postnotes_note_id_tl, while at postnotes/print/eachnote, it can be retrieved by \l__postnotes_print_note_id_tl. As you'd expect, the ID is a unique identifier, so it is possible to establish the connection there.

Regarding the placement of those hooks, they were thought for the purpose of passing variables along. I don't think the placement of postnotes/store/note is relevant, but postnotes/print/eachnote comes before the "text mark" is typeset, which may be undesirable for the purpose of tagging. Anyway, for the time being just for you to be aware, and I can adjust things as needed in this regard.

One thing to think about is how an endnote is to be represented in the tagged structure (sorry I don't know the jargon, I hope you get what I meant by that). I haven't the faintest idea of which structural elements a text-to-speech software is able to handle, and if the distinction between a footnote and an endnote is relevant in this context. But I presume an endnote should look somewhat similar to a footnote and, provided it is possible to get some connection working, the rest is polishing things.

Now, beyond the main point, endnotes in general, and postnotes in particular, also mobilize a few other structural elements "on the page":

  1. The printed endnotes typically include sectioning commands.
  2. The printed endnotes also may make use of running page headers (postnotes does so by default).
  3. The printed endnotes may be typeset in some list environment.

An example that includes these elements:

\documentclass{book}

\usepackage{postnotes}
\counterwithin*{postnote}{chapter}
\AddToHook{cmd/chapter/before}{%
  \postnotesection{\section*{Notes to chapter \pnthechapternextnote}}}
\usepackage{lipsum}
\usepackage{hyperref}

\ExplSyntaxOn
\NewDocumentCommand \testchapter {}
  { \prg_replicate:nn { 10 } { \lipsum[1-2]\postnote{\lipsum[3]}\par } }
\ExplSyntaxOff

\begin{document}

\chapter{Chapter 1}

\testchapter

\chapter{Chapter 2}

\testchapter

\printpostnotes

\end{document}

I'd assume running page headers in general are disregarded for tagging. Either way, the headers for endnotes should receive the general treatment. What to do with the sectioning commands of the printed notes, I'm not sure. Perhaps they should just be abstracted away like the page headers, in which case they must be actively ignored for the purposes of tagging. Finally, I'm not sure the list environments are of any particular significance here (probably not), but I mentioned because they exist.

I think this is a good set to start. But, of course, I am at your disposal to provide further info, examples, to try things out, and to adjust things on postnotes side if needed.

u-fischer commented 1 year ago

Can you add to the following example code that does the following:

The mark in the next should print also the id of the mark. So e.g. (id1:1) and (id2:1). And the postnote should print the two id's:

1. (from id1,id2) a note.

(side question, how do you handle backlinks if there are two marks like here?).

\documentclass{article}

\usepackage{postnotes}
\usepackage{lipsum}
\usepackage{hyperref}

\begin{document}

abc\postnote{\label{note:1}a note}

abc\postnoteref{note:1}

\printpostnotes

\end{document}
gusbrs commented 1 year ago

Can you add to the following example code that does the following:

The mark in the next should print also the id of the mark. So e.g. (id1:1) and (id2:1). And the postnote should print the two id's:

1. (from id1,id2) a note.

I'm not sure I understand what you want here. Each note has only one ID, and it's the same on both "sides", it's just stored in different variables in each context for practical reasons. For example:

\documentclass{book}

\usepackage{postnotes}
\usepackage{hyperref}

\begin{document}

\chapter{Chapter 1}

% note that the ID is independent from the printed representation
\setcounter{postnote}{7}

\ExplSyntaxOn
\postnote{id:~\l__postnotes_print_note_id_tl}~id:~\l__postnotes_note_id_tl
\par
\postnote{id:~\l__postnotes_print_note_id_tl}~id:~\l__postnotes_note_id_tl
\ExplSyntaxOff

\printpostnotes

\end{document}

The idea is that, with the unique id and the hooks, you can pass the information from one point to the other, for example:

\documentclass{book}

\usepackage{postnotes}
\usepackage{hyperref}
\usepackage{lipsum}

\newcommand{\foo}{}
\newcommand{\baz}{}
\ExplSyntaxOn
\AddToHook{postnotes/store/note}{
  \tl_gset:cx { g__store_context_ \l__postnotes_note_id_tl _tl }{\foo}
}
\AddToHook{postnotes/print/eachnote}{
  \tl_set:Nv \baz { g__store_context_ \l__postnotes_print_note_id_tl _tl }
}
\ExplSyntaxOff

\begin{document}

\chapter{Chapter 1}

\renewcommand{\foo}{one}

\lipsum[1]\postnote{value of foo: \baz\ \lipsum[2]}

\renewcommand{\foo}{two}

\lipsum[3]\postnote{value of foo: \baz\ \lipsum[4]}

\printpostnotes

\end{document}

For the package, I create a property list for each note, csnamed on the ID, and a property for each piece of information I want to pass around, including the content of the note itself.

But, perhaps for the case of tagging, things could even be simpler. In the case for the anchors (see below), all we need is the ID number, there's no need to pass "the value of a variable in the place where the mark was". But I'm not really sure what tagging requires.

(side question, how do you handle backlinks if there are two marks like here?).

The backlinks rely on the unique ID as well. I set \MakeLinkTarget* { postnote. \l_@@_note_id_tl .mark } on the mark, and \MakeLinkTarget* { postnote. \l_@@_print_note_id_tl .text } on the printed note. From there, it is trivial to build hyperlinks on each side, given I always have the ID. I recall having pestered you somewhere in the past about it.

The \postnoteref does not make "two marks" though, it will itself point to wherever the label is set (to the printed note as you set it, and to the mark if you use the label argument), it is a cross-reference, after all. But the backlink from the note will always point to \postnote, not to \postnoteref.

u-fischer commented 1 year ago

I'm not sure I understand what you want here

I'm trying to understand your data structure.

When tagging I will have to create in my example three structures:

<struct 15> postnote mark</struct 15>

<struct 23> postnoteref mark</struct 23>

<struct 301> postnote text</struct 301>

and then I have to add cross referrences (this can be done later, e.g. at end document):

<struct 15  Ref to 301 > postnote mark</struct 15>

<struct 23 Ref to 301> postnoteref mark</struct 23>

<struct 301 Ref to {15,23}> postnote text</struct 301>

That means I need to figure out which textmark(s) structure number belongs to which note.

gusbrs commented 1 year ago

I'm trying to understand your data structure.

When tagging I will have to create in my example three structures:

<struct 15> postnote mark</struct 15>

<struct 23> postnoteref mark</struct 23>

<struct 301> postnote text</struct 301>

and then I have to add cross referrences (this can be done later, e.g. at end document):

<struct 15  Ref to 301 > postnote mark</struct 15>

<struct 23 Ref to 301> postnoteref mark</struct 23>

<struct 301 Ref to {15,23}> postnote text</struct 301>

That means I need to figure out which textmark(s) structure number belongs to which note.

As far as I can see, the connection between "postnote mark" and "postnote text" can be established by storing the value of whatever defines the number of the "struct" at the point the mark is placed (we'd store the 15 with the note) and then, when the note is being printed we know the "current struct counter" (301) and can retrieve the 15 and make the cross reference.

\postnoteref is more complicated. We don't have access to the ID inside it. As I said, it is just a standard cross-reference, referencing to a standard label, only formatted as a mark. We'd need to store more information (I presume the "current struct counter") with it. I can see ways to do it for \zlabel and even for the label argument, but it'd be harder for the standard \label issued by the user in the note, I have no control over that. But, even if we did store the counter, \printpostnotes has no knowledge of \postnoteref, it is just unaware of it. The only way I can see would be to keep track of all calls to \postnoteref and post-process this list at the end of the document. Do you see any other way?

gusbrs commented 1 year ago

Hi @u-fischer , I've been working on this, and I think I have a structure for providing the required cross-reference data. I've set up a branch to test things (https://github.com/gusbrs/postnotes/tree/tagging) with some preliminary results. What I did there was separate the problems of doing the actual tagging (tagpdf side) from processing the data (postnotes side). For this purpose I just added a counter stepping at para/before to emulate the struct_num, and when the actual tagging markup is in place, we can just grab \tag_get:n{struct_num} instead. A fiction that suffices for its purpose as long as we keep each note in its own paragraph, of course.

Using the branch, the following document:

\DocumentMetadata{testphase=phase-II}
\documentclass{article}

\usepackage{postnotes}
\usepackage{zref-user}
\usepackage{zref-hyperref}
\usepackage{hyperref}

\begin{document}

\section{Section 1}

\postnote{1}

\postnote{2}

\postnote[label=en:mark:1,zlabel=en:mark:1]{3\label{en:text:1}\zlabel{en:text:1}}

\postnote{4}

\postnote[label=en:mark:2,zlabel=en:mark:2]{5\label{en:text:2}\zlabel{en:text:2}}

\postnote{6}

nomark\postnote[nomark]{7\label{en:text:3}}

\postnoteref{en:text:1}

\postnoteref{en:mark:1}

\postnotezref{en:text:1}

\postnotezref{en:mark:1}

\postnoteref{en:text:2}

\postnoteref{en:mark:2}

\postnotezref{en:text:2}

\postnotezref{en:mark:2}

\postnoteref{en:text:3}

\printpostnotes

\end{document}

Outputs the following in the log:

The property list \g__postnotes_tag_postnoteID_to_structnum_prop contains the
pairs (without outer braces):
>  {1}  =>  {2}
>  {2}  =>  {3}
>  {3}  =>  {4}
>  {4}  =>  {5}
>  {5}  =>  {6}
>  {6}  =>  {7}
>  {7}  =>  {8}.
<recently read> }

l.47 \end{document}

The property list \g__postnotes_tag_printID_to_structnum_prop contains the
pairs (without outer braces):
>  {1}  =>  {19}
>  {2}  =>  {20}
>  {3}  =>  {21}
>  {4}  =>  {22}
>  {5}  =>  {23}
>  {6}  =>  {24}
>  {7}  =>  {25}.
<recently read> }

l.47 \end{document}

The property list \g__postnotes_tag_postnote_crossrefs_prop contains the pairs
(without outer braces):
>  {2}  =>  {19}
>  {3}  =>  {20}
>  {4}  =>  {21}
>  {5}  =>  {22}
>  {6}  =>  {23}
>  {7}  =>  {24}.
<recently read> }

l.47 \end{document}

The property list \g__postnotes_tag_postnoteref_crossrefs_prop contains the
pairs (without outer braces):
>  {9}  =>  {21}
>  {10}  =>  {21}
>  {13}  =>  {23}
>  {14}  =>  {23}
>  {17}  =>  {25}.
<recently read> }

l.47 \end{document}

The property list \g__postnotes_tag_postnotezref_crossrefs_prop contains the
pairs (without outer braces):
>  {11}  =>  {21}
>  {12}  =>  {21}
>  {15}  =>  {23}
>  {16}  =>  {23}.
<recently read> }

l.47 \end{document}

I think this is the information you need, hopefully also in the form you need. Is this indeed the case?

If it is, I'd say there are two pieces missing: i) adding the actual tagging markup; ii) passing the cross-reference data to the required place. How would you like to proceed in that regard? Should I try to emulate what is done in latex-lab-footnotes.dtx under your supervision?

Regarding the repository and the branch, in case you'd like to play with it. I expect l3build install to work for it. But I personally don't use it, so let me know if something is amiss. Also, I've found a couple of issues (independent of this discussion) for which I should probably get a fix and make a release sooner rather than later, so watch the repo for forced pushes from my side. Done, should be quiet now.

u-fischer commented 1 year ago

sorry I was a bit busy but I hope I can look at the weekend at it.

gusbrs commented 1 year ago

No rush, of course, in your time. And I hope not to spoil your carnival! ;-)

gusbrs commented 1 year ago

Hi @u-fischer , with the release behind, any news on this one?

u-fischer commented 1 year ago

well with the release you can actually more or less do what the bib-code does too. At least as long as the notes are printed as a list. The following changes temporarly an internal command as we hadn't the time yet to discuss an interface. You only need to ensure that the ref/label use some sensible, unique name (and then test and find all the errors ;-))

\documentclass{article}

\usepackage{postnotes}
\usepackage{hyperref}

\begin{document}

\section{Section 1}

\ExplSyntaxOn
\leavevmode %paragraph has started ...
 \tag_mc_end_push:
 \tagstructbegin{tag=Lbl,ref=postnote.some-unique-id} %<========
 \tagmcbegin{}
    \postnote{1}   %here is the note
 \tagmcend
 \tagstructend
 \tag_mc_begin_pop:n{}
\ExplSyntaxOff

\ExplSyntaxOn
\cs_set:Npn \__block_list_item_begin: %internal, interface will come
     { 
       \tag_struct_begin:n
         {
          tag=\LItag,
          label= postnote.some-unique-id % <========
         }
     }
\ExplSyntaxOff     

\printpostnotes

\end{document}

A second remark. This here errors if tagging is activated:

\printpostnotes blub

I haven't really tracked down where exactly it fails, but basically it means the the \@doenpe call is missing somewhere (see e.g. https://github.com/plk/biblatex/issues/1279 for a similar discussion). A simple work around is to add a \par after the notes always.

gusbrs commented 1 year ago

well with the release you can actually more or less do what the bib-code does too. At least as long as the notes are printed as a list. The following changes temporarly an internal command as we hadn't the time yet to discuss an interface. You only need to ensure that the ref/label use some sensible, unique name (and then test and find all the errors ;-))

I'll see if I can move forward with these hints. Thank you!

A second remark. This here errors if tagging is activated:

\printpostnotes blub

I haven't really tracked down where exactly it fails, but basically it means the the \@doenpe call is missing somewhere (see e.g. plk/biblatex#1279 for a similar discussion). A simple work around is to add a \par after the notes always.

Mhm, I do close the environment with a standard \end{<environment>}:

https://github.com/gusbrs/postnotes/blob/79df62eb998722e39e4bd6d23ceb1eab4d1f3f4b/postnotes.dtx#L1417-L1419

So it is probably not the same cause as that of biblatex. On the other hand, I am aware there is probably something amiss in my handling of the lists environments, since I'm needing to manually issue a \mode_leave_vertical: for each item to avoid "perhaps a missing \cs{item}'' error for empty notes. Which, as far as I can see, shouldn't be needed. I tried to track why it happens, but I couldn't figure it out, so I'm living with this work around for the time being.

But I'll look into it and see if I can understand this. Thanks!

gusbrs commented 1 year ago

Hi @u-fischer , it seems that, despite my best efforts, my current knowledge about tagging and PDF structure is not enough to see this through. Unfortunately, at the moment I also don't have more time than what I already invested in this to acquire it. So I'm backing down from this attempt and, for the time being, giving up on providing tagging support for postnotes.

Sorry for the fruitless fuss in your issue tracker. And thank you again for your help.