Closed martindholmes closed 7 months ago
I've narrowed this down a bit: in the initial phase of xhtml5_templates_overlap_module.xsl, a map is constructed of all the anchors and/or generated ids to which annotations and apparatus entries are linked. These collation items do make it into the map linked to the ids of text spans, but their terminal anchors do not seem to appear in the map, which I think is the problem. If I can figure out why they're not being included in the map, that might solve the problem.
A little more progress: the anchor is present in the tokenized HTML document, but it does not make it into the notesMap variable, so the bug is presumably in the makeNoteMap() function.
It does not seem to be the case that an overlapping note causes the problem; even if the note in the context of app_76 is removed the problem still shows up. All cases of this in Selimus are cases where the code determines that the collation marker needs to be mapped to one or more blocks (typically lines). If collation app elements are excluded from that block-mapping process, then they work perfectly, but in that case the overlapping note link is lost, so that is not a solution. It rather seems that when collation apps are mapped to one or more blocks, their markers are never inserted.
I've committed an initial fix in rev 14319, but it may be partial; I think it covers at least five of the six cases in emdSel_M. There may also be some fallout from it.
As expected, we're down to one problem app marker in Selimus, app_189. That will be the focus of the remaining work. I know what the problem is here, but I don't yet see a straightforward solution.
I've added a detailed comment in the overlap file laying out what needs to be done. Just need time to implement and test it.
Final slightly hacky fix committed in rev 14383. The overlap module could do with a rewrite, probably using maps, but that will have to wait. Hopefully this fixes the existing issues for the moment.
This fix apparently generates some duplicate ids, so a further solution will be needed. Working on that now.
The one remaining issue in emdSel_M is apparently not fixable by re-encoding. The problem is that the appMarker item is correctly inserted, but it gets its id from the annotation which is also pointing at it, instead of from the apparatus entry. I can't yet figure out why that's happening.
A fix in rev 14491 fixed the last remaining problem in QME. There is still one in MoMS TEMP3, though, so that's next on my list.
This lib will need a complete rewrite to make it robust. I think one way to do it might be:
I think this may be able to produce the output we're currently trying to produce, but with a single pass through the text (other than the map construction passes), and with less likelihood of error. The additional issue of display for multi-line annotated spans could or should be (or perhaps is already) handled by JavaScript at load time.
Very simple proof of concept (runs on itself):
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:math="http://www.w3.org/2005/xpath-functions/math"
xmlns:xd="http://www.oxygenxml.com/ns/doc/xsl"
exclude-result-prefixes="#all"
xmlns="http://www.w3.org/1999/xhtml"
xpath-default-namespace="http://www.w3.org/1999/xhtml"
xmlns:map="http://www.w3.org/2005/xpath-functions/map"
version="3.0">
<xd:doc scope="stylesheet">
<xd:desc>
<xd:p><xd:b>Created on:</xd:b> Aug 9, 2023</xd:p>
<xd:p><xd:b>Author:</xd:b> mholmes</xd:p>
<xd:p>This is just a test file for working out approaches to various things.</xd:p>
</xd:desc>
</xd:doc>
<xsl:mode on-no-match="shallow-copy" use-accumulators="#all"/>
<xsl:output method="xml" encoding="UTF-8" normalization-form="NFC" exclude-result-prefixes="#all"/>
<xsl:variable name="blockNames" as="xs:string*" select="('body', 'div', 'h1', 'p')"/>
<xsl:variable name="sourceHtml" as="element(body)">
<body>
<div>
<h1>Test data</h1>
<p id="para1">This paragraph <span class="anc" id="anc1"/>contains <emph id="emph1">several</emph>
bits<span class="anc" id="anc2"/> of <span class="anc" id="anc3"/>text, <span class="anc" id="anc4"/>some of which<span class="anc" id="anc5"/> are tagged<span class="anc" id="anc6"/> in an <emph id="emph2">overlapping manner</emph>.</p>
<p id="para2">
This paragraph has virtually nothing in it.
</p>
</div>
</body>
</xsl:variable>
<xsl:variable name="pointers" as="element(div)">
<div>
<ul>
<li id="ann1" data-from="#para1" data-to="#para1">Annotation 1</li>
<li id="ann2" data-from="#anc1" data-to="#anc2">Annotation 2</li>
<li id="ann3" data-from="#emph1" data-to="#anc2">Annotation 3</li>
<li id="ann4" data-from="#anc3" data-to="#anc5">Annotation 4</li>
<li id="ann5" data-from="#anc4" data-to="#anc6">Annotation 5</li>
<li id="ann6" data-from="#emph2" data-to="#emph2">Annotation 6</li>
</ul>
</div>
</xsl:variable>
<xsl:variable name="mapStarters" as="map(xs:string, xs:string*)">
<xsl:map>
<xsl:for-each select="$sourceHtml/descendant::*[@id]">
<xsl:variable name="elId" as="xs:string" select="xs:string(@id)"/>
<xsl:map-entry key="$elId" select="distinct-values((for $p in $pointers//li[substring-after(@data-from, '#') = $elId] return xs:string($p/@id)))"/>
</xsl:for-each>
</xsl:map>
</xsl:variable>
<xsl:variable name="mapEnders" as="map(xs:string, xs:string*)">
<xsl:map>
<xsl:for-each select="$sourceHtml/descendant::*[@id]">
<xsl:variable name="elId" as="xs:string" select="xs:string(@id)"/>
<xsl:map-entry key="$elId" select="distinct-values((for $p in $pointers//li[substring-after(@data-to, '#') = $elId] return xs:string($p/@id)))"/>
</xsl:for-each>
</xsl:map>
</xsl:variable>
<xsl:accumulator name="activePointers" as="xs:string*" initial-value="()">
<xsl:accumulator-rule phase="start" match="*[@id]">
<xsl:variable name="newIds" as="xs:string*" select="map:get($mapStarters, xs:string(@id))"/>
<xsl:sequence select="distinct-values(($value, $newIds))"/>
</xsl:accumulator-rule>
<xsl:accumulator-rule phase="end" match="*[@id]">
<xsl:variable name="idsToRemove" as="xs:string*" select="map:get($mapEnders, xs:string(@id))"/>
<xsl:sequence select="$value[not(. = $idsToRemove)]"/>
</xsl:accumulator-rule>
</xsl:accumulator>
<xsl:template match="/">
<xsl:apply-templates select="$sourceHtml"/>
</xsl:template>
<xsl:template match="text()">
<xsl:variable name="activeIds" as="xs:string*" select="accumulator-after('activePointers')"/>
<xsl:choose>
<xsl:when test="count($activeIds) gt 0">
<span data-active="{string-join($activeIds, ' ')}"><xsl:value-of select="."/></span>
</xsl:when>
<xsl:otherwise>
<xsl:sequence select="."/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
<xsl:template match="*[@id]">
<xsl:variable name="enders" as="xs:string*" select="map:get($mapEnders, xs:string(@id))"/>
<xsl:choose>
<xsl:when test="count($enders) lt 1">
<xsl:next-match/>
</xsl:when>
<xsl:when test="local-name() = $blockNames">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
<xsl:for-each select="$enders">
<button><xsl:sequence select="."/></button>
</xsl:for-each>
</xsl:copy>
</xsl:when>
<xsl:otherwise>
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
<xsl:for-each select="$enders">
<button><xsl:sequence select="."/></button>
</xsl:for-each>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>
Do you need any input on this one?
We just had the crucial discussion in Teams; the outcome I think is this:
The plan is to have three kinds of output signal for annotations/collations:
1. Collations get a collation symbol right at the end of the span they apply to.
2. Annotations pointing at ASSp ids get an annotation symbol right at the beginning
of the ASSp block they point to.
3. Spanning annotations get an annotation symbol right at the end of the span they
apply to.
The intended behaviour is that mousing over or focusing the symbols will highlight
the relevant text, and clicking on them will pop up the annotation and maintain
the highlighting (unhighlighting any previously-highlighted span).
This will provide a keyboard-navigable interface that requires less pre-processing
on page load than the old approach.
An icon like this:
https://thenounproject.com/icon/comment-215258/
could be used for annotations.
The complete implementation for lemdo-dev was committed in rev 17413. There will be fallout for the anthologies, as well as possible build breaks for lemdo-dev.
This seems to be working as expected, so I'll close this ticket now. Yay!
Several cases in Selimus show that where a collation app is linked to a span between two anchors, but that span overlaps with or is contiguous with a point or span which is simultaneously annotated, no collation marker is created in the output. It's no surprise that a situation this complicated gives rise to a problem, and it will be hard to figure out exactly how we can resolve it given that we can't really nest one link inside another, but we must deal with it somehow. This collation entry from emdSel_M is a good illustration of the problem:
The context it points at looks like this (with breaks added to make it more readable):
and there is a complicating annotation pointing at part of the same range:
All three anchors do make it into the output, but the required collation symbol does not, so the collation information is not apparent or accessible (although it's there in the appendix). So the issue is to discover why there is no collation marker image (which serves both as clickable item to pop up the apparatus entry, and as target for the link from the collation entry back to the text).