lombardpress / lombardpress-schema

0 stars 2 forks source link

Creating reference targets of text segments #26

Open stenskjaer opened 8 years ago

stenskjaer commented 8 years ago

Have we (and do we need) a common way of creating targets used for cross references?

I ran into this problem when wanting to automatically create potential targets for cross references in my processor (for latex in this case, so I wanted it to create \label{} commands).

When using the @xml:id attritube to identify and point to targets of cross references, you risk confusing a processor, because it is not every @xml:id that you want to make into a \label{}, even if you restrict it to the <seg> element.

What I did was to mark targets for cross references in the following way:

Ita ex superius positis palam est anime cognitionem et 
<seg type="target" xml:id="31sent-2">scientiam esse appetendam a
nobis, tum propter ipsius saltem humane anime</seg> incorruptibilitatem 
et perpetuitatem sive propter eius diligibilitatem, tum propter eius

By using the <seg> element with a @type="target" attribute, I can mark the exact passage that I want to refer to, and at the same time avoid all <seg> elements with a @xml:id attribute of being converted into a \label{}.

The conversion to latex was then easy with the following XSLT template:

<xsl:template name="createLabelFromId">
  <xsl:param name="labelSuffix" />
  <xsl:if test="@xml:id and @type='target'">
    <xsl:text>\label{</xsl:text><xsl:value-of select="@xml:id"/><xsl:value-of select="$labelSuffix"/><xsl:text>}</xsl:text>
  </xsl:if>
</xsl:template>

And then calling it on <seg> elements (as well as all kinds of other potential targets) like so:

<xsl:template match="seg">
  <xsl:call-template name="createLabelFromId">
    <xsl:with-param name="labelSuffix">beg</xsl:with-param>
  </xsl:call-template>
  <xsl:apply-templates/>
  <xsl:call-template name="createLabelFromId">
    <xsl:with-param name="labelSuffix">end</xsl:with-param>
  </xsl:call-template>
</xsl:template>

I don't know how much flexibility you want to give editors with these sort of things, but this approach (or any other method of identifying targets of cross references) could go into the schema defintion.

In this case I wanted to create targets that I point to from another latex document, so I didn't think much about internal cross references. But internal cross referencing could easily be done with <ptr> and <ref> elements. This solution is based on the TEI Lite schema (http://www.tei-c.org/release/doc/tei-p5-exemplars/html/tei_lite.doc.html#U5-ptrs), which I think it would make fine sense to adopt.

jeffreycwitt commented 8 years ago

Ok, my answer to this is going to be involved.

First, I want to develop a generic answer for "references". That a "cross-reference" is just like any other reference. The only difference is that the target of the reference happens to be in another part of what is considered the same text.

Second, I'm wary of a requirement an extra encoding of a target cross-reference. This seems to be a recipe for all kinds of cross-nesting problem. What if an author references to passages that overlap.

I think references should only point at a specific point in the document hierarchy. I.e. most often the paragraph level. I understand this means we can't be as specific about the target of cross-references, but it is a sacrifice I'm willing to make. (Incidentally, it is also a reason to prefer small paragraph divisions over large divisions).

Another problem with requiring that the target be in encoded is that it assumes that an editor has control over other texts where the target of the cross reference may occur. But this may not always be the case. For Chapter 10 might make a reference to Chapter 1, but I might only be the editor of chapter 10 and not have access or the ability to change chapter 1.

Here's my recommendation for encoding targets of both quotations and references:

   <cit>
         <quote type="commentary" source="http://scta.info/resource/pll1d10c1-d1e3477">Nunc post Filii aeternitatem.</quote>
         <bibl><ref type="commentary" target="http://scta.info/resource/pll1d10c1-d1e3477">Lombard, Sent., I, d. 10, c. 1, n. 1</ref></bibl>
   </cit> 

  <cit>
         <ref type="commentary" target="http://scta.info/resource/pll1d10c1-d1e3477">Magister in libro 1, distinctione 10</quote>
         <bibl><ref type="commentary" target="http://scta.info/resource/pll1d10c1-d1e3477">Lombard, Sent., I, d. 10, c. 1, n. 1</ref></bibl>
   </cit> 

We can change the type="commentary" to type="scta" if we want. This was a way of telling a processor that this is a target that has an entry in the SCTA Database.

These references point to the LinkedData url id for the target paragraph. We can use this id, to create passive links as well. So, when the database gets build, each of these references or quotes to lombard gets turned into a passive relation as well (referencedBy or quotedBy) which a person sees when viewing the Lombard Text.

Now, how to answer the LaTeX pieces of this. I was planning to do the following.

Prior to running, building the LaTeX text from TEI, we could send a query to the public database and ask for a list of referencedBy and quotedBy resources within the text currently being typeset.

Now, when we run the conversion from TEI to LaTeX, each time we hit a new resource, we check the results of that query and see if the paragraph or division in question has been referenced. If the resource in question has been referenced, we can drop in an \label{} for that resource. We can do the same for any active references -- that is, check to see if there is a corresponding target within the text in question, and if so drop in the appropriate LaTeX reference.

Again, this is not going to be as precise as the method of marking a reference with a <seg> as you have done. LaTeX will only really be able to give the line numbers of the references paragraph and not of the precise phrase.

However, for the reasons mentioned above, hard encoding targeted passages in TEI, on the scale we are planning for seems impossible.

stenskjaer commented 8 years ago

I understand that view, especially when it comes to references between works or inside a work.

What my intended use for this was to have a specific segment of text to refer to in my dissertation, and I would be very sorry to hardcode line references in the text as it is going to change many times before I finish the text.

Of course I see the problem that if you encode a text that is going to be a more broadly used text, it cannot have such vestiges from a particular analysis. But I wonder how I should then handle it if I really want to make line references. A local branch with such particularia maybe.

A side note: What are your thoughts about generation about LaTeX files? What do you consider to be the context of that? Just out of curiousity.