projectLEMDO / lemdoIssues

Repository for LEMDO issue tracking and related documents.
MIT License
1 stars 0 forks source link

Collation markers are lost when located in spans also linked to annotations #181

Closed martindholmes closed 7 months ago

martindholmes commented 1 year ago

Several cases in Selimus show that where a collation app is linked to a span between two anchors, but that span overlaps with or is contiguous with a point or span which is simultaneously annotated, no collation marker is created in the output. It's no surprise that a situation this complicated gives rise to a problem, and it will be hard to figure out exactly how we can resolve it given that we can't really nest one link inside another, but we must deal with it somehow. This collation entry from emdSel_M is a good illustration of the problem:

<app from="doc:emdSel_M#emdSel_M_anc_854" to="doc:emdSel_M#emdSel_M_anc_2282">
    <lem source="#emdSel_M_collation_thisEd">[Aside] Advise thee, Acomat</lem>
    <rdg wit="#emdSel_M_collation_Q1">Aduise thee <hi rendition="rnd:italic">Acomat</hi></rdg>
    <rdg wit="#emdSel_M_collation_Vitkus">Advise thee, Acomat</rdg>
    <rdg wit="#emdSel_M_collation_Hopkinson">Advise thee, Acomat</rdg>
    <rdg wit="#emdSel_M_collation_2Grosart">Advise thee, Acomat</rdg>
    <rdg wit="#emdSel_M_collation_1Grosart">Aduise thee <hi rendition="rnd:italic">Acomat</hi></rdg>
</app>

The context it points at looks like this (with breaks added to make it more readable):

<l><stage type="business">
<supplied>
<anchor xml:id="emdSel_M_anc_854"/>
Aside
<anchor xml:id="emdSel_M_anc_855"/>
</supplied>
</stage> 
Advise thee, Acomat
<anchor xml:id="emdSel_M_anc_2282"/>, what’s best to do.</l>

and there is a complicating annotation pointing at part of the same range:

<note type="annotation" target="doc:emdSel_M#emdSel_M_anc_854" targetEnd="doc:emdSel_M#emdSel_M_anc_855">
<note type="label">Aside</note>
<note type="gloss">Though the 1594 quarto does not indicate an aside here, Acomat’s <quote>Advise thee, Acomat</quote> suggests a self-directed speech.</note>
</note>

All three anchors do make it into the output, but the required collation symbol does not, so the collation information is not apparent or accessible (although it's there in the appendix). So the issue is to discover why there is no collation marker image (which serves both as clickable item to pop up the apparatus entry, and as target for the link from the collation entry back to the text).

martindholmes commented 1 year ago

I've narrowed this down a bit: in the initial phase of xhtml5_templates_overlap_module.xsl, a map is constructed of all the anchors and/or generated ids to which annotations and apparatus entries are linked. These collation items do make it into the map linked to the ids of text spans, but their terminal anchors do not seem to appear in the map, which I think is the problem. If I can figure out why they're not being included in the map, that might solve the problem.

martindholmes commented 1 year ago

A little more progress: the anchor is present in the tokenized HTML document, but it does not make it into the notesMap variable, so the bug is presumably in the makeNoteMap() function.

martindholmes commented 1 year ago

It does not seem to be the case that an overlapping note causes the problem; even if the note in the context of app_76 is removed the problem still shows up. All cases of this in Selimus are cases where the code determines that the collation marker needs to be mapped to one or more blocks (typically lines). If collation app elements are excluded from that block-mapping process, then they work perfectly, but in that case the overlapping note link is lost, so that is not a solution. It rather seems that when collation apps are mapped to one or more blocks, their markers are never inserted.

martindholmes commented 1 year ago

I've committed an initial fix in rev 14319, but it may be partial; I think it covers at least five of the six cases in emdSel_M. There may also be some fallout from it.

martindholmes commented 1 year ago

As expected, we're down to one problem app marker in Selimus, app_189. That will be the focus of the remaining work. I know what the problem is here, but I don't yet see a straightforward solution.

martindholmes commented 1 year ago

I've added a detailed comment in the overlap file laying out what needs to be done. Just need time to implement and test it.

martindholmes commented 1 year ago

Final slightly hacky fix committed in rev 14383. The overlap module could do with a rewrite, probably using maps, but that will have to wait. Hopefully this fixes the existing issues for the moment.

martindholmes commented 1 year ago

This fix apparently generates some duplicate ids, so a further solution will be needed. Working on that now.

martindholmes commented 1 year ago

The one remaining issue in emdSel_M is apparently not fixable by re-encoding. The problem is that the appMarker item is correctly inserted, but it gets its id from the annotation which is also pointing at it, instead of from the apparatus entry. I can't yet figure out why that's happening.

martindholmes commented 1 year ago

A fix in rev 14491 fixed the last remaining problem in QME. There is still one in MoMS TEMP3, though, so that's next on my list.

martindholmes commented 1 year ago

This lib will need a complete rewrite to make it robust. I think one way to do it might be:

  1. Create a map whose keys are all the element ids in the text, where each value consists of a sequence of zero or more ids of annotations/collations for which the key is the INITIAL marker.
  2. Create a similar map, this time for each id to zero or more ids of annotations/collations for which this element is the FINAL marker (for some cases, the initial may be the same as the final, of course).
  3. Iterate through the text building an xsl:accumulator, such that, at any point in the text, it contains a sequence of the ids of all annotations and collations which encompass the current point in the text; and every time you encounter an element with an id, trigger an accumulator-rule in start phase which adds any new ids looked up through the maps above for which this id is the initial point, and then another accumulator-rule in the end phase which removes any ids for which this is the final marker from the accumulator sequence. Immediately after handling the element concerned, output the asterisk or collation marker for any items for which this IS the final element.
  4. Whenever you encounter a text node, use the accumulator to determine whether any annotation spans or collations are active, and if there are any, wrap the text in a span with the appropriate attributes configured.

I think this may be able to produce the output we're currently trying to produce, but with a single pass through the text (other than the map construction passes), and with less likelihood of error. The additional issue of display for multi-line annotated spans could or should be (or perhaps is already) handled by JavaScript at load time.

martindholmes commented 1 year ago

Very simple proof of concept (runs on itself):

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    xmlns:math="http://www.w3.org/2005/xpath-functions/math"
    xmlns:xd="http://www.oxygenxml.com/ns/doc/xsl"
    exclude-result-prefixes="#all"
    xmlns="http://www.w3.org/1999/xhtml"
    xpath-default-namespace="http://www.w3.org/1999/xhtml"
    xmlns:map="http://www.w3.org/2005/xpath-functions/map"
    version="3.0">
    <xd:doc scope="stylesheet">
        <xd:desc>
            <xd:p><xd:b>Created on:</xd:b> Aug 9, 2023</xd:p>
            <xd:p><xd:b>Author:</xd:b> mholmes</xd:p>
            <xd:p>This is just a test file for working out approaches to various things.</xd:p>
        </xd:desc>
    </xd:doc>

    <xsl:mode on-no-match="shallow-copy" use-accumulators="#all"/>

    <xsl:output method="xml" encoding="UTF-8" normalization-form="NFC" exclude-result-prefixes="#all"/>

    <xsl:variable name="blockNames" as="xs:string*" select="('body', 'div', 'h1', 'p')"/>

    <xsl:variable name="sourceHtml" as="element(body)">
        <body>
            <div>
                <h1>Test data</h1>
                <p id="para1">This paragraph <span class="anc" id="anc1"/>contains <emph id="emph1">several</emph> 
                    bits<span class="anc" id="anc2"/> of <span class="anc" id="anc3"/>text, <span class="anc" id="anc4"/>some of which<span class="anc" id="anc5"/> are tagged<span class="anc" id="anc6"/> in an <emph id="emph2">overlapping manner</emph>.</p>
                <p id="para2">
                    This paragraph has virtually nothing in it.
                </p>
            </div>
        </body>

    </xsl:variable>

    <xsl:variable name="pointers" as="element(div)">
        <div>
            <ul>
                <li id="ann1" data-from="#para1" data-to="#para1">Annotation 1</li>
                <li id="ann2" data-from="#anc1" data-to="#anc2">Annotation 2</li>
                <li id="ann3" data-from="#emph1" data-to="#anc2">Annotation 3</li>
                <li id="ann4" data-from="#anc3" data-to="#anc5">Annotation 4</li>
                <li id="ann5" data-from="#anc4" data-to="#anc6">Annotation 5</li>
                <li id="ann6" data-from="#emph2" data-to="#emph2">Annotation 6</li>
            </ul>
        </div>
    </xsl:variable>

    <xsl:variable name="mapStarters" as="map(xs:string, xs:string*)">
        <xsl:map>
            <xsl:for-each select="$sourceHtml/descendant::*[@id]">
                <xsl:variable name="elId" as="xs:string" select="xs:string(@id)"/>
                <xsl:map-entry key="$elId" select="distinct-values((for $p in $pointers//li[substring-after(@data-from, '#') = $elId] return xs:string($p/@id)))"/>
            </xsl:for-each>
        </xsl:map>
    </xsl:variable> 

    <xsl:variable name="mapEnders" as="map(xs:string, xs:string*)">
        <xsl:map>
            <xsl:for-each select="$sourceHtml/descendant::*[@id]">
                <xsl:variable name="elId" as="xs:string" select="xs:string(@id)"/>
                <xsl:map-entry key="$elId" select="distinct-values((for $p in $pointers//li[substring-after(@data-to, '#') = $elId] return xs:string($p/@id)))"/>
            </xsl:for-each>
        </xsl:map>
    </xsl:variable>

    <xsl:accumulator name="activePointers" as="xs:string*" initial-value="()">
        <xsl:accumulator-rule phase="start" match="*[@id]">
            <xsl:variable name="newIds" as="xs:string*" select="map:get($mapStarters, xs:string(@id))"/>
            <xsl:sequence select="distinct-values(($value, $newIds))"/>
        </xsl:accumulator-rule>
        <xsl:accumulator-rule phase="end" match="*[@id]">
            <xsl:variable name="idsToRemove" as="xs:string*" select="map:get($mapEnders, xs:string(@id))"/>
            <xsl:sequence select="$value[not(. = $idsToRemove)]"/>
        </xsl:accumulator-rule>
    </xsl:accumulator>

    <xsl:template match="/">
        <xsl:apply-templates select="$sourceHtml"/>
    </xsl:template>

    <xsl:template match="text()">
        <xsl:variable name="activeIds" as="xs:string*" select="accumulator-after('activePointers')"/>
        <xsl:choose>
            <xsl:when test="count($activeIds) gt 0">
                <span data-active="{string-join($activeIds, ' ')}"><xsl:value-of select="."/></span>
            </xsl:when>
            <xsl:otherwise>
                <xsl:sequence select="."/>
            </xsl:otherwise>
        </xsl:choose>
    </xsl:template>

    <xsl:template match="*[@id]">
        <xsl:variable name="enders" as="xs:string*" select="map:get($mapEnders, xs:string(@id))"/>
        <xsl:choose>
            <xsl:when test="count($enders) lt 1">
                <xsl:next-match/>
            </xsl:when>
            <xsl:when test="local-name() = $blockNames">
                <xsl:copy>
                    <xsl:apply-templates select="@*|node()"/>
                    <xsl:for-each select="$enders">
                        <button><xsl:sequence select="."/></button>
                    </xsl:for-each>
                </xsl:copy>
            </xsl:when>
            <xsl:otherwise>
                <xsl:copy>
                    <xsl:apply-templates select="@*|node()"/>
                </xsl:copy>
                <xsl:for-each select="$enders">
                    <button><xsl:sequence select="."/></button>
                </xsl:for-each>
            </xsl:otherwise>
        </xsl:choose>

    </xsl:template>

</xsl:stylesheet>
JanelleJenstad commented 1 year ago

Do you need any input on this one?

martindholmes commented 1 year ago

We just had the crucial discussion in Teams; the outcome I think is this:

The plan is to have three kinds of output signal for annotations/collations:

            1. Collations get a collation symbol right at the end of the span they apply to.
            2. Annotations pointing at ASSp ids get an annotation symbol right at the beginning
               of the ASSp block they point to.
            3. Spanning annotations get an annotation symbol right at the end of the span they
               apply to.

            The intended behaviour is that mousing over or focusing the symbols will highlight
            the relevant text, and clicking on them will pop up the annotation and maintain
            the highlighting (unhighlighting any previously-highlighted span). 

            This will provide a keyboard-navigable interface that requires less pre-processing
            on page load than the old approach.

An icon like this:

https://thenounproject.com/icon/comment-215258/

could be used for annotations.

martindholmes commented 7 months ago

The complete implementation for lemdo-dev was committed in rev 17413. There will be fallout for the anthologies, as well as possible build breaks for lemdo-dev.

martindholmes commented 7 months ago

This seems to be working as expected, so I'll close this ticket now. Yay!