Closed nathangibson closed 3 years ago
@nathangibson @dlschwartz is probably the best person to answer this one. It makes sense to me, as does keeping the factoids in the same record as the text, but Dan may have other ideas, as he has done more thinking on the factoid model then I have.
This is a good question and a bit of a sticky one. I've thought about this some because of texts like the letters of Severus and the Lives of John of Ephesus. In neither case do we (yet) use our URN system. Not only do we not have text in the Corpus to point to, but those texts aren't broken down into canonical divisions. In many cases the Letters, and in rare cases the Lives, are really short. In these cases, pointing to the whole thing would work fine just, like pointing to the shorter annals of the Chronicles is fine. However, when it comes to the longer ones, things get a bit more complicated. For that matter, references to the first account of the flooding of Edessa is a bit awkward: http://wwwb.library.vanderbilt.edu/exist/apps/srophe/spear/factoid.html?id=http://syriaca.org/spear/8559-12 .
But first things first. A lot of people aren't crazy about the CTS/DTS URN system, mainly because a browser can't resolve it. Correct me if I'm wrong Winona, but Srophe does what it does with these URNs by converting the Corpus URI-like portion of the URN into the actual URI/URL for the text, and then combines that with the final number in order to treat the whole think like a link to an anchor in the Corpus text. The connection between the URN and it's constituent URIs (both Syriaca and Corpus) make this system somewhat respectable but it's not really ideal. Especially since URNs aren't resolvable URIs, I think looking for other options is a fine idea.
The idea you suggest here will frequently work very well. You get the
benefit of being able to point to very specific pieces of text. The
downside, however, is that you will likely run into problems with the
containing structure of the XML, especially when dealing with events. Off
the top of my head, I can think of two tricky scenarios when dealing with
complicated events. First, I would guess that at some point you'll run into
a situation where the source text for event B starts in the middle and
extends beyond the source text for event A. Second, you might want to
source an event to the last sentence of one paragraph and the first
sentence of the next. You couldn't actually do that. You would have to wrap
both of the paragraphs in
An alternative would be to use the paragraph numbers to point to the text. This isn't as precise, of course. Also, it solves the former containing problem but leaves you in the same boat regarding the latter (you would still have to point to two whole paragraphs). I've been thinking along these lines for when we actually get the Letters and the Lives into the Corpus. This would essentially turn the paragraph breaks in the printed editions into a canonical division of the text and establish URNs based on those numbers.
Technically, you can store factoids either in the same file or in a separate file. You could use @xml:id attributes to source factoids either way. I obviously don't have this option since factoids aren't part of the Corpus data model. Any other use of the text you produce would just ignore or strip out the factoids. We'll have to talk about this in more detail, but I've been thinking quite a bit about factoids lately and how they actually work. SPEAR TEI models factoids, not persons or sources. The factoid is, in the lingo of the TEI, "stand-off markup" (as is all Syriaca.org data actually). This suggests that perhaps storing them separately from the text is appropriate. That said, I kind of want you to store them in the same file. It might be good for the article we are planning to be able to show different use cases and different workflows. We'll have to discuss this further.
Storing the factoids in the text would complicate multi-lingual display of
sources on the factoid page. We've now got a TEI encoding of the Chronicle
even though it's not available anywhere yet. When it is, I hope to be able
to pull both English and Syriac onto the factoid page. If you wanted
similar functionality while storing factoids in the text, you would need to
duplicate your
Well, I didn't expect to write such a long email. I hope this is helpful. Mull this over Nathan and perhaps we can chat about these or other issues in person.
By the way, Georg Vogeler strongly recommended that I add a @type="factoid" to my div elements, which I notice you've done Nathan. It's clear to us internally what these things are but we need to be more explicit for users. I'll be doing this for SPEAR shortly. I'm still trying to think through whether I should add @subtype attributes for "nameVariant", "birthDate", "gender", "event", "relation". Perhaps you and I can discuss this Nathan.
@dlschwartz Thanks so much for these thoughts. I wonder if we should link to paragraphs as sources rather than to rs
elements. Although it's not as granular, the workload may be more realistic. We need to align English and Arabic paragraphs anyway. The paragraphs are what we arbitrarily decide rather than being in any sense canonical. So although we could assign them CTS/DTS URNs, I'm not sure I would really see any point in that.
For aligning English and Arabic, we have to decide whether to simply use anchors or whether to use the more elaborate TAN system. @wsalesky finds having a div per paragraph overly verbose (see https://github.com/usaybia/srophe-eXist-app/issues/1#issuecomment-488687378). If we were using TAN, creating the CTS/DTS URNs would not be difficult. But I would still wonder, given our limited resources, why not simply give each paragraph (div
or p
) an xml:id and link to that? See https://github.com/usaybia/usaybia-data/blob/master/data/texts/tei/iu-sample-kopf-en-tan.xml#L96 as an example.
In any case, attaching the factoid to the paragraph rather than a smaller text chunk would make it easy to display with either languages, since the paragraphs will be aligned. It would also be unlikely that an event would span more than one paragraph, but when it does we could link to more than one paragraph (rather than having to use multiple rs
elements linked together with @previous
and @next
). I would think we would want to tag names, etc. in both English and Arabic (so that we can make them into links and grab spellings), but maintain factoids only in the Arabic text. What do you think, @wsalesky ?
Placing the factoid immediately in/after the relevant paragraph might be an easier workflow than maintaining it in a separate doc. But I could see doing either one.
One thing I'm envisioning is that we could pre-populate factoids if we adequately tag the text. In the example paragraph I gave, a script could take the persName and placeName elements and create factoids at the end of that paragraph for name variants and events. We would just have to fill in the missing info instead of adding the entire factoid div.
In this use case inside a source text, a @type='factoid'
would be important to distinguish divs from regular text divs. It could make sense to even put factoids inside note
elements to make it really clear they're not part of the source text, but unfortunately notes can't contain divs.
PS @subtype
could be helpful but not essential--yes, we can discuss when we meet.
Note to self: In the interim we've decided to put factoids in separate docs, one per biographical unit (e.g., 14.21). However, per the changes of using ab
instead of div
for factoids, it is more conceivable to incorporate these into the text. We need to discuss.
@wsalesky @dlschwartz I'm trying to figure out how best to link factoids to the text passage they relate to. In SPEAR you do this with CTS URNs. But the text I'm dealing with has no standard divisions smaller than biographical entries, which are typically multiple pages. Anything smaller than that I would have to create on a purely arbitrary basis.
I would like to be able to associate factoids with a paragraph or even a sentence. And when the primary source is displayed as a running text, I would like to be able to display the factoids alongside it (as well as doing like SPEAR and displaying the text passage on the factoid detail page).
I'm going to suggest something wild that could make sense with my workflow, but feel free to reel me back. The use of the
rs
element came up recently. What if I were to use that to "highlight" a short passage and give it an xml:id and then link to that in the factoid bibl (instead of to a CTS URN)? Or if there should be a two-way link this could be done withlinkGrp/link
and@targetFunc
.Also, is there any reason the factoids need to be in a separate document from the text itself? Or would it be OK to place them immediately after the relevant paragraph, for ease of maintenance and label them with
@type
?Thanks!
See the code example below or in context at https://github.com/usaybia/usaybia-data/blob/eff4877dd7cc3b35b8c3f657ed098c55b1632eaf/data/texts/tei/iu-sample-kopf-en-tan.xml#L111.