lombardpress / lbp-print-xslt

0 stars 4 forks source link

Handling of empty lem-element #1

Open stenskjaer opened 8 years ago

stenskjaer commented 8 years ago

An enduring problem has been how to process the apparatus rendering when the lemma is empty. I think it might help to raise it as an issue to present some alternative possibilities in this context.

I have myself used a simple approach based on a tokenization of the preceding text and then grabbing the last item in that list (https://github.com/stenskjaer/thesis-xslt/blob/master/to-tex.xsl#L173) but that won't work in a context where only XSLT 1.0 is available (libxml).

That might be solved by this template:

<xsl:template name="substring-after-last">
  <!-- Based on XSLT Cookbook p. 28-31 -->
  <xsl:param name="input"/>
  <xsl:param name="substr"/>

  <!-- Get string that follows first occurrence -->
  <xsl:variable name="temp" select="substring-after($input, $substr)"/>

  <xsl:choose>
    <!-- If it still contains the search string, continue recursively -->
    <xsl:when test="$substr and contains($temp, $substr)">
      <xsl:call-template name="substring-after-last">
        <xsl:with-param name="input" select="$temp"/>
        <xsl:with-param name="substr" select="$substr"/>
      </xsl:call-template>
    </xsl:when>
    <!-- Else, return the temporary string, as it comes after last instance of
            the string we were looking for -->
    <xsl:otherwise>
      <xsl:value-of select="$temp"/>
    </xsl:otherwise>
  </xsl:choose>
</xsl:template>

Which we can use to process as short substring preceding a given app-element like so:

<xsl:variable name="preceding-word">
  <xsl:call-template name="substring-after-last">
    <xsl:with-param name="input" select="substring(normalize-space(string-join(preceding::text(), '')), string-length(normalize-space(string-join(preceding::text(), ''))) - 25)"/>
    <xsl:with-param name="substr" select="' '"/>
  </xsl:call-template>
</xsl:variable>

This is pretty good a getting the word that precedes the app element.

But we are still left with a problem: What do we do in the case of sibling app-elements? That is, a situation like this:

Praeterea, sicut oculus
<app>
  <lem/>
  <rdg wit="#B">nicticoracis</rdg>
</app>
<app>
  <lem><supplied>se habet</supplied></lem>
  <rdg wit="#B"><space extent="8" unit="chars" reason="rasura" /></rdg>
</app>
ad lumen solis 

The above template would here return oculus when run on the first apparatus and then nicticoracis when run on the second, but nicticoracis is not printed in the text. Actually, in this case that might still be what you want, if you want the apparatus to print something like this:

10 nicticoracis post oculus B 10 post nicticoracis vac. 8 litt. B

But that is entirely accidental. Because it might as well have been:

Praeterea, sicut oculus
<app>
  <lem/>
  <rdg wit="#B">nicticoracis</rdg>
  <rdg wit="#A">semper</rdg>
</app>
<app>
  <lem><supplied>se habet</supplied></lem>
  <rdg wit="#B"><gap extent="8" unit="chars" reason="rasura" /></rdg>
</app>
ad lumen solis 

And in that case it would have returned semper in the processing of the second app-element, which clearly does not work.

But as I see it, this second problem of sibling app-elements is a question of

  1. how you want to present this particular case, and
  2. whether this is not a rare edge case that you should add an exceptional way of handling?
stenskjaer commented 8 years ago

Some more thoughts on this. The problem with sibling app elements is that this proposed method of finding the previous word only finds the previous text node, no matter what the parent of that node might be. So if it's part of a rdg or note or any other element that you don't want rendered but still contain text nodes, it still takes that as the previous word. That is a problem.

It could be mitigated by creating a sophisticated algorithm for identifying the parent of the previous word (or the sibling of the current app element) and determine whether that is then part of the text. Furthermore, one might be able to create a template that could reconstruct the preceding text of any witness used in the apparatus and on that basis determine the content of the preceding word of that witness had been. But as it is now, I think these things will

  1. become too complex for being worth the effort and strain, and
  2. be too fragile to implement while the standard is still developing as it is.

In stead, I wonder whether you could use the relatively simple procedure sketched above, and then in the case there is any problem, indicate what the preceding word is at the rdg-level.

So an unproblematic reading would just look like this:

Praeterea, sicut oculus
<app>
  <lem/>
  <rdg wit="#B">nicticoracis</rdg>
</app>
ad lumen solis

which would make an apparatus this this easy to render:

10 nicticoracis post oculus B

In the case of ambiguous or problematic preceding word, it could be indicated by the @prev attribute, like so:

Praeterea, sicut oculus
<app>
  <lem/>
  <rdg xml:id="B-1" wit="#B">nicticoracis</rdg>
  <rdg wit="#A">semper</rdg>
</app>
<app>
  <lem><supplied>se habet</supplied></lem>
  <rdg wit="#B" prev="#B-1"><gap extent="8" unit="chars" reason="rasura" /></rdg>
</app>
ad lumen solis 

This way, it would be made clear to the processor what the content of the preceding reading is.

The procedure during processing could then simply be:

Note: I realize now that these suggestions have implications for the schema.