Open yellwork opened 7 years ago
Having a to-do list for this seems wise. FYI: The following ack
one-liner will extract names from the who
attribute of said
tags.
ack -o "(?<=<said who=\")[\w\'\. ]*" *.xml
This will compile a sorted list of all the names across the corpus:
ack -ho "(?<=<said who=\")[\w\'\. ]*" *.xml | sort | uniq
I was using it as a sanity check to catch misspellings when I marked up "Telemachus."
Could we also assign, or let people claim, episodes to mark up with dialogue on this, or another issue? I am going to tackle another episode as soon as I can, and want to avoid reduplicating labor.
Good idea. Can we formally assign them or do we just call dibs here?
After you started <said>
tagging, Chris, I snagged a lot of the low-hanging fruit (the less chatty, shorter episodes). Claiming the longer ones now makes sense because they’re likely to take a considerable bit of time to mark up.
Those ack commands will come in very handy once we start figuring out the speaking parts.
Going to do the @who
attribution on “Proteus” now.
Going to tackle @who
on “Aeolus” now.
@c-forster, that ack
hack is great. I use ag
, "The Silver Searcher," myself, and was able to get it to work the same way using ag --nofilename -o "(?<=<said who=\")[\w\'\. ]*" *.xml | sort | uniq
. I'll put this into a makefile so that we can run these sorts of things easily.
I’m simplifying this. A ⟨listPerson⟩
for the entire novel would be incredible, but … too much work for now. So I’m going to switch all @who
values to character initials and put the key in the separate plaintext file persons.txt.
Sounds good. I'm not seeing the key in persons.txt, though? Anyway when it's there, if it's in some kind of regular format, like comma- or tab-separated, then it'll be easy to make a list of these keys to add to the header.
I’m doing it all offline while I go through all eighteen episodes. I’ll merge them all into the repository once done.
My local persons.txt looks like this:
db [tab]Davy Byrne dbc [tab]Davy Byrne's curate dbm [tab]D.B. Murphy dd [tab]Dan Dawson did [tab]Dilly Dedalus
That could be the basis for a <listPerson>
– information I’d love to see added but too much for us right now (I feel).
Awesome, sounds great.
Ronan Crowley notifications@github.com writes:
I’m doing it all offline while I go through all eighteen episodes. I’ll merge them all into the repository once done.
My local persons.txt looks like this:
db Davy Byrne dbc Davy Byrne's curate dbm D.B. Murphy dd Dan Dawson did Dilly Dedalus
That could be the basis for a
<listPerson>
– information I’d love to see added but too much for us right now (I feel).
Some content that was marooned in the closed #9 was your suggestion, Jonathan, for unclear @who
values. Something like:
<lb n="060004"/><said xml:id="060004-a" who="Cunningham">―Come on, Simon.
<certainty target="#060004-a" match="@who" locus="value" assertedValue="Power" degree="0.5">
<desc>It's unclear here whether it's Cunningham or Power speaking.</desc>
</certainty>
</said>
I’m going to go ahead and use this encoding whenever an unclear speaker is limited to a handful of candidates. Unless you’ve another idea?
Sounds great. Let's do it. I'll make a note of this in our conventions list, too.
Ronan Crowley notifications@github.com writes:
Some content that was marooned in the closed #9 was your suggestion, Jonathan, for unclear @who values. Something like:
―Come on, Simon. It's unclear here whether it's Cunningham or Power speaking. I’m going to go ahead and use this encoding whenever an unclear speaker is limited to a handful of candidates. Unless you’ve another idea?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.*
How do we attribute dialogue in an exchange between several people ? There’s a spot like this in Hades where no speakers are given for several lines of dialogue:
<lb n="060114"/><said who="lb">―I met M'Coy this morning,</said> Mr Bloom said. <said who="lb">He said he'd try to come.</said></p>
<p><lb n="060115"/>The carriage halted short.
<lb n="060116"/><said who="unclear">―What's wrong?</said>
<lb n="060117"/><said who="unclear">―We're stopped.</said>
<lb n="060118"/><said who="unclear">―Where are we?</said></p>
<p><lb n="060119"/>Mr Bloom put his head out of the window.
<lb n="060120"/><said who="lb">―The grand canal,</said> he said.</p>
The unclears can only be Cunningham, Power or Simon Dedalus (with Bloom, perhaps, chiming in at U 6.117). How best would that be encoded?
I read the TEI docs on <certainty>
again but this is the best I could think of:
<lb n="060114"/><said who="lb">―I met M'Coy this morning,</said> Mr Bloom said. <said who="lb">He said he'd try to come.</said></p>
<p><lb n="060115"/>The carriage halted short.
<lb n="060116"/><said who="unclear">―What's wrong?
<certainty match="@who" locus="value" assertedValue="Power" degree="0.33" />
<certainty match="@who" locus="value" assertedValue="Cunningham" degree="0.33" />
<certainty match="@who" locus="value" assertedValue="Simon Dedalus" degree="0.33" />
</said>
<lb n="060117"/><said who="unclear">―We're stopped.
<certainty match="@who" locus="value" assertedValue="Power" degree="0.33" />
<certainty match="@who" locus="value" assertedValue="Cunningham" degree="0.33" />
<certainty match="@who" locus="value" assertedValue="Simon Dedalus" degree="0.33" />
</said>
<lb n="060118"/><said who="unclear">―Where are we?
<certainty match="@who" locus="value" assertedValue="Power" degree="0.33" />
<certainty match="@who" locus="value" assertedValue="Cunningham" degree="0.33" />
<certainty match="@who" locus="value" assertedValue="Simon Dedalus" degree="0.33" />
</said></p>
<p><lb n="060119"/>Mr Bloom put his head out of the window.
<lb n="060120"/><said who="lb">―The grand canal,</said> he said.</p>
...which is super kludgey and not very DRY. Ideally we could do target="#060116 #060118 #060119"
on a single <certainty>
set, and avoid all this repetition, but it doesn't look like XML can handle multiple attribute values.
@tcatapano, any ideas?
This is a to-do issue to pick out the various tasks discussed in #9:
[x] Convert all double-hyphen dialogue dashes to the quotation dash or horizontal bar.
[x] Shift the
</said>
tags in<said>―</said>
structures to the end of character speech. Add all intermedial<said>
tagging.[x] Proof the
</said>
tagging for every episode. How? We will visualize all of the episodes in a browser and colour just the</said>
tagged dialogue. Episodes remaining: 1. “Telemachus” 2. “Nestor” 3. “Proteus” 4. “Calypso” 5. “Lotus Eaters” 6. “Hades” 7. “Aeolus” 8. “Lestrygonians” 9. “Scylla and Charybdis” 10. “Wandering Rocks” 11. “Sirens” 12. “Cyclops” 13. “Nausicaa” 14. “Oxen of the Sun” 15. “Circe” 16. “Eumaeus” 17. “Ithaca” 18. “Penelope”[x] Disambiguate the appropriate
<emph>
to<said>
tagging. [there might be a few other stragglers][x] Add
@who
attribution for every instance of<said>
(or in “Circe”<sp>
). Use character names for the values.[ ] Switch
@who
values to@xml:id
.[ ] Compile a
<listPerson>
dossier of speakers.