open-editions / corpus-joyce-ulysses-tei

James Joyce's novel Ulysses in TEI XML. Work-in-progress.
20 stars 17 forks source link

Disambiguating <emph> into multiple taggings #7

Open yellwork opened 7 years ago

yellwork commented 7 years ago

Here is an interesting example of typographic distinction opening up into multiple possibilities for tagging:

<p><lb n="030099"/><foreign xml:lang="it">All'erta!</foreign></p>

@JonathanReeve switched the inherited <emph> tagging for a <foreign xml:lang="it">. But the italics also render a quotation (not that every quotation is so distinguished!). Gifford has:

All’erta! (Italian) On guard! Be vigilant! These are the opening words of Giuseppe Verdi’s opera Il Trovatore (The Troubador).

Is this then

<p><lb n="030099"/><quote source="Il Trovatore"><foreign xml:lang="it">All'erta!</foreign></quote></p>

Are there other examples in this vein?

yellwork commented 7 years ago

Whereas

<lb n="161744"/>admiration of Rossini's <emph>Stabat Mater</emph>, a work simply abounding in

Is just an instance of <title> and not also <foreign xml:lang="la">?

JonathanReeve commented 7 years ago

I like the embedded <quote> and <foreign> above, with "All'erta". I agree with you about "Stabat Matter," too--I think just <title> is fine with that one.

yellwork commented 7 years ago

Is it worth indicating that the bare words of such titles as encountered in the episode are also in Latin &c.? I’m looking through ‘Proteus’ here. On an earlier pass for <emph> disambiguation, you rendered a few likely instances of <title> as <foreign>:

<lb n="030167"/> […] But he must send me <foreign xml:lang="fr">La Vie de Jésus</foreign> by M. Léo Taxil.
<lb n="030196"/>[…] Rich booty you brought back; <foreign xml:lang="fr">Le
<lb n="030197"/>Tutu</foreign>, five tattered numbers of <foreign xml:lang="fr">Pantalon Blanc et Culotte Rouge</foreign>;

Gotcha moment aside (!), this is valuable information that we don’t want clipped in the shift to <title>. How about something like the following?

<lb n="030167"/> […] But he must send me <title type="book" xml:lang="fr">La Vie de Jésus</title> by M. Léo Taxil.

I note, in passing, that there’s also a case to be made for marking up the remainder of the sentence as by <foreign xml:lang="fr" rend="none">M.</foreign> Léo Taxil. (Drawing on our discussion in #2.)

JonathanReeve commented 7 years ago

I like that syntax of embedding the language in the tag. Let's do it.

And thanks for catching those mistakes! I've just corrected them, using your suggested syntax.

yellwork commented 7 years ago

I was just finishing the @said tagging for “Lestrygonians” when I spotted something in the earlier encoding that gave me pause:

<p><lb n="081039"/>He hummed, prolonging in solemn echo the closes of the bars:
<lb n="081040"/><said who="Leopold Bloom">―<foreign xml:lang="it">Don Giovanni, a cenar teco
<lb n="081041"/>M'invitasti.</foreign></said></p>
[...]
<lb n="081051"/><said who="Leopold Bloom">―<foreign xml:lang="it">A cenar teco.</foreign></said></p>
<p><lb n="081052"/>What does that <foreign xml:lang="it">teco</foreign> mean? Tonight perhaps.
<lb n="081053"/><said who="Leopold Bloom">―<emph>Don Giovanni, thou hast me invited
<lb n="081054"/>To come to supper tonight,
<lb n="081055"/>The rum the rumdum.</emph></said></p>
<p><lb n="081056"/>Doesn't go properly.</p>

Really, these instances of <foreign> should all be <quote xml:lang="it">, shouldn’t they? I proposed a double encoding – <quote><foreign> – at the head of this issue, but I’m starting to think <quote xml:lang="it"> (like <title xml:lang="fr"> above) would be neater. What’s anyone else’s sense? This would probably require us to rework a lot of the Latin in the book, <foreign xml:lang="la">, as quotation too: <quote xml:lang="it">. See the first line of dialogue, for example. For:

<lb n="010005"/><said who="Buck Mulligan">―<foreign xml:lang="la">Introibo ad altare Dei.</foreign></said></p>

read

<lb n="010005"/><said who="Buck Mulligan">―<quote xml:lang="la">Introibo ad altare Dei.</quote></said></p>

I’m happy to make these changes, but I wanted to run the proposal by the group first. I’m sure if we make our encoding decisions clear in the README, tools like your foreign-language analysis can be tailored to catch non-English quotations, right, Jonathan?

JonathanReeve commented 7 years ago

This sounds great. I think <quote> isn't rendered as italicized by default, though, so if we merge contiguous <quote> and <foreign>, we should probably add @rend, like <quote xml:lang="la" rend="italics"> to preserve the rendering as italicized.

JonathanReeve commented 7 years ago

And yep, this won't make too much of a difference in analyses, since we can just look for @xml:lang instead of foreign.