saulalbert / unixclan

Utility scripts for TalkBank's CLAN
0 stars 0 forks source link

Capitalize all turn-initial TCUs (in turn-beginnings and turn-incoming overlaps) #10

Closed saulalbert closed 6 years ago

saulalbert commented 6 years ago

Capitalize any turn-initial utterance. This should include the first character in any line following a speaker ID and a colon/tab combination.

e.g.:

*PS002: you enjoyed yourself in America?

becomes

*PS002: You enjoyed yourself in America?

Also, where there is a turn-incoming overlap, the turn-incoming (i.e. second line) overlap should have the first character capitalized

e.g.:

*PS002: yes oh Jim 's in Flint this afternoon at the Hart and <Straw> [>]
        Club .
*PS006: <hmm:> [<]. 

becomes

*PS002: Yes oh Jim 's in Flint this afternoon at the Hart and <Straw> [>]
        Club .
*PS006: <Hmm:> [<]. 
saulalbert commented 6 years ago

NB: with the turn-incoming overlap, the capitalization should work alongside the CHAT2CAlite overlap conversion too - so in fact

*PS002: yes oh Jim 's in Flint this afternoon at the Hart and <Straw> [>]
        Club .
*PS006: <hmm:> [<].

should come out as

*PS002: Yes oh Jim 's in Flint this afternoon at the Hart and ⌈Straw⌉
        Club .
*PS006:                                                       ⌊Hmm: ⌋.