open-editions / corpus-joyce-ulysses-tei

James Joyce's novel Ulysses in TEI XML. Work-in-progress.
20 stars 17 forks source link

A tweak on the <emph> to <said> disambiguation #20

Closed yellwork closed 7 years ago

yellwork commented 7 years ago

We have a ·lot· of quoted direct speech within character dialogue in our corpus. An early instance:

—You said, Stephen answered, O, it’s only Dedalus whose mother is beastly dead. (U 1.198–99)

Initially this was all tagged as <emph> on account of the italics. We had been tackling the <emph> to <said> disambiguation by just tagging the direct quoted speech the same way that we treat dialogue: <said who=""> etc. We were trusting to the nesting to indicate when direct speech was being quoted within character dialogue without any additional markup.

But it turns out that there are plenty of exceptions to this loose rule. So I went through the corpus and added a @type="reported" on all instances of <emph> that we had retagged as <said>. It took a while, but I think we’ve teased out a potential ambiguity in the process. Some examples:

she was one of those good souls who had always <lb n="100139"/>to be told twice <said who="Father Conmee" type="reported">bless you, my child,</said> that they have been absolved, <said who="Father Conmee" type="reported">pray for <lb n="100140"/>me</said>.
a shrill <lb n="131174"/>voice went crying, wailing: <said who="shrill voice" type="reported"><title type="newspaper">Evening Telegraph</title>, stop press edition! Result of <lb n="131175"/>the Gold Cup races!</said>
handed him <lb n="161336"/>his silk hat when it was knocked off and he said <said who="Parnell" type="reported">Thank you</said>, excited as he <lb n="161337"/>undoubtedly was
JonathanReeve commented 7 years ago

Great idea! I think this'll be an improvement.

JonathanReeve commented 7 years ago

The DTD validation is complaining that @type isn't valid for <said>. But it seems like there's an attribute for this: @direct. The TEI docs have it that indirect speech would be <said direct="false">. We can assume that otherwise it's direct speech (or thought), and so we don't need direct="true". I'll go ahead and make this global change, if that's OK, just to get the validation working.

yellwork commented 7 years ago

Ah. I never checked the Travis CI. Good catch.

I didn’t realize @type wasn’t valid but I had read through the <said> description and looked longingly at @direct. Is it a slight tag abuse for us to use it now to describe direct speech being quoted within direct speech or do we limit its application to the few cases of recalled (italicised) direct speech that I highlighted above?

handed him <lb n="161336"/>his silk hat when it was knocked off and he said <said who="Parnell" direct="false" rend="italics">Thank you</said>, excited as he <lb n="161337"/>undoubtedly was

I added an @rend="italics" to preserve the rendering. Or is <said direct="false"> enough to indicate it?

Would that mean we revert <said> within <said> to the way it was before I added the @type? e.g. Ned Lambert reading Dawson’s inflated prose from the newspaper:

<lb n="070295"/><said who="Ned Lambert">―<said who="Dan Dawson">Or again if we but climb the serried mountain peaks.</said></said>

Do we need an @direct and/or an @rend here?

JonathanReeve commented 7 years ago

Good questions. I think we can keep the nested <said> structure as-is, but adding direct="false" where appropriate would be a good idea. That way we can distinguish between a character's actual speech (as reported by Joyce, at least) and his speech as reported by some other, potentially less reliable, character.

As for italics, you're right--it might be a good idea to add rend="italics" here, since I think the standard TEI renderers don't automatically render <said direct="false"> as italicized.

yellwork commented 7 years ago

OK. Sorry, I should have raised this as an issue instead of marching ahead and making a load of changes!

So our convention is, ultimately, to disambiguate inherited <emph> into <said> in two different ways, right?

(1) If a character quotes direct speech within her speech, we’re encoding it like this:

<said who="Stephen Dedalus">―You said,</said> Stephen answered, <said who="Stephen Dedalus"><said who="Buck Mulligan" rend="italics">O, it's only Dedalus whose mother is beastly dead</said>.</said>

(2) If direct speech is recalled in interior monologue or (occasionally) represented in the third-person narrative using italics, we’re encoding it like this:

she was one of those good souls who had always to be told twice <said who="Father Conmee" direct="false" rend="italics">bless you, my child,</said> that they have been absolved, <said who="Father Conmee" direct="false" rend="italics">pray for me</said>.

Does that sound right?

JonathanReeve commented 7 years ago

Sounds great! I'll go ahead and add this to our conventions document.

yellwork commented 7 years ago

Great! I'm going to quickly go through all the @direct and turn them into types (1) or (2) above. Shouldn't take long. (Unless you're already working on it?!)

yellwork commented 7 years ago

With the addition to our conventions document, I feel like this issue is now closed. (Always happy for it to be reopened if needs be.) R