kshawkin / Best-Practices-for-TEI-in-Libraries

Best Practices for TEI in Libraries: A guide for mass digitization, automated workflows, and promotion of interoperability with XML using the TEI
http://purl.oclc.org/NET/teiinlibraries
32 stars 8 forks source link

specify how to encode sigla #36

Closed kshawkin closed 7 years ago

kshawkin commented 8 years ago

TEI practice varies on encoding sigla representing a note's point of attachment in a text. Sometimes the text of the siglum (such as a superscript number) is encoded as content of <ref> and other times as the value of @n on either <ptr/> or <note>. The BPTL similarly allows various practices.

Footnotes and endnotes repeat the superscript figure at the beginning of the note (so they can be matched with the point of reference). If you encode the note at the point of attachment, you might use @n to represent both sigla, but P5 never addresses how this siglum would be encoded if you are not encoding the note at the point of attachment. While you could also use @n, what happens if there's a mistake in the source document with regard to the numbering and you want to encode this? An attribute value will not suffice. Should the BPTL try to account for this?

See discussion of this question on TEI-L.

kshawkin commented 8 years ago

During the BPTL workgroup's call today, @PFSchaffner agreed to summarize the options for us to decide how to proceed.

kshawkin commented 8 years ago

During today's BPTL workgroup call, @PFSchaffner said he would post a link here to a Google Docs file containing his summary of the options.

kshawkin commented 8 years ago

Here's Paul's notes, which he sent to teilib-l.

kshawkin commented 8 years ago

During today's BPTL call, Martin Mueller agreed to write up a recommendation for us to consider.

kshawkin commented 8 years ago

Martin Mueller emailed Kevin on 2016-11-03 to say:

I would recommend strong deprecation of the “conventional” practice of removing the siglum from the text. It is part of the text, such as it was. The safer choice is to wrap the siglum in a <ref> element and make the <ref> and <note> elements point to each other. If the <ref> element has a stable location in the text, the <note> element can be moved ad libitum. Marginal notes very often have no clearly defined point of reference in the text. In that case you make a judgment call about where to put the <ref> element.

kshawkin commented 7 years ago

During yesterday's BPTL, call, I agreed to find an example of a footnote that we can refer to and then propose one or more encodings of this example to help us reach consensus.

kshawkin commented 7 years ago

Okay, pretend you have a source document with the following:

http://i.stack.imgur.com/VGqNT.png

Based on the discussion at the Dec. 12 BPTL call and other discussion up till now, I suggest that we rewrite the Level-3 recommendation for notes to recommend the following:

a) Use <ref>, not <ptr/>, for sigla, with the text of the siglum itself as content of the <ref>. This is so that you don't have to put the siglum itself in an attribute value, causing problems for non-Unicode characters and for marking errors in the source document. If there's no siglum, use <ptr/> to mark the surmised point of attachment.

b) Encode the notes themselves where they occur in the layout on the page, or if desired, move them so that they occur directly after the <ref> or <ptr/> element marking the point of attachment. (We can then remove the recommendation about putting marginal notes at the beginning of the paragraph to which they refer since this recommendation will become unnecessary.)

c) Use <label> for any superscript text, symbol, or other marker within the note.

Here's the example encoded accordingly (but leaving off any encoding of subscript rendering for simplicity):

<p>The three little pigs built their houses out of straw,<ref target="#n1">1</ref> sticks<ref target="#n2">2</ref> and bricks.<ref target="#n3">3</ref></p>
[. . .]
<note place="bottom" anchored="true" xml:id="n1"><label>1</label>not to be confused with hay</note>
<note place="bottom" anchored="true" xml:id="n2"><label>2</label>or lumber according to some sources</note>
<note place="bottom" anchored="true" xml:id="n3"><label>3</label>probably fired clay bricks</note>

As mentioned in (b), an alternative way to do this would be:

<p>The three little pigs built their houses out of 
straw,<ref target="#n1">1</ref><note place="bottom" anchored="true" xml:id="n1"><label>1</label>not 
to be confused with hay</note> 
sticks<ref target="#n2">2</ref><note place="bottom" anchored="true" xml:id="n2"><label>2</label>or 
lumber according to some sources</note> and 
bricks.<ref target="#n3">3</ref><note place="bottom" anchored="true" xml:id="n3"><label>3</label>probably 
fired clay bricks</note></p>

I would need to modify the Alger Hiss example (and any others) accordingly.

There would be no additional instructions at Level 4.


Now, you may be wondering about a few things we discussed but which didn't work their way into my proposal:

1) I believe that Elli and Syd suggested that those keyboarding rather than doing OCR generally don't bother capturing sigla but instead place the note at the point of attachment. Indeed, the BPTL says that levels 3 and 4 could be created through keyboarding. However, it's not a lot of extra work to capture the siglum, and you can handle the things I mentioned in (a) above if you do capture the siglum. It seems worth recommending in all cases.

2) I believe Elli and Syd suggested that we encode notes where they occur in the layout of the page at Level 3 but then move to the point of attachment for Level 4. I believe this is because we had in mind OCR for Level 3 and keyboarding for Level 4. However, on reflection, I don't think it's fair for us to make that assumption. Indeed, as the BPTL is currently written, both levels 3 and 4 could be created through either process.

3) Various people have suggested that the point of attachment and the note should link to each other. My example has linking only from the point of attachment to the note. I recommend this because it requires fewer keystrokes, doesn't leave you open to errors where elements don't actually link to each other as intended, and because a user interface could be engineered to construct links back based on even a one-way link in the encoding.

kshawkin commented 7 years ago

During today's BPTL call, Elli agreed to review my proposed text above.

emylonas commented 7 years ago

This seems reasonable to me and the example makes sense. You use the word "subscript" where you mean "superscript" in discussing it - doesn't matter unless it creeps into any of the BP prose!

Questions:

  1. is the use of <label> required?
  2. @kshawkin says, on Dec. 13, "Encode the notes themselves where they occur in the layout on the page, or if desired, move them so that they occur directly after the or element marking the point of attachment." Are those the only options? or could the user put them in the backmatter, for ex. as discussed in the section on Notes in Level 3 (4.2.4.6.4).
  3. I'm not sure exactly what you are recommending in the case of un-anchored margin notes.
kshawkin commented 7 years ago

Yes, thanks for catching my accidental use of "subscript".

As for your questions:

  1. I suggest making use of <label> required at Level 3 since I think this falls in scope of the "purpose" of Level 3 given in the BPTL.

  2. I had forgotten that we offered people the option to collate notes elsewhere. However, in line with my recommendation (b), I think we should no longer give this option since, upon reflection, it's not really useful to anybody. If you have a born-digital document, you would just put the notes at the point of attachment.

  3. Please ignore what I wrote above in (b) about removing the sentence about marginal notes. I realize now that this is needed for un-anchored notes.

emylonas commented 7 years ago

ok, all makes sense. your response to point 3 above could have an edit to indicate "unanchored marginal notes".

kshawkin commented 7 years ago

During BPTL call on 2017-03-06, we agreed that I would implement the final consensus.

kshawkin commented 7 years ago

Implemented at https://github.com/kshawkin/Best-Practices-for-TEI-in-Libraries/commit/70c53f50ac45f7e333fc851a4027fbe623ab5e41 and https://github.com/kshawkin/Best-Practices-for-TEI-in-Libraries/commit/70183bc05f2e62e261b9b20327dd88b99ebf18d9 .

Regarding item 3 above, I actually decided to remove that sentence anyway: I think that the encoder should always determine the most likely implied point of attachment.