saulalbert / unixclan

Utility scripts for TalkBank's CLAN
0 stars 0 forks source link

CHAT2CAlite removes spaces from around mid-TCU '≈' #23

Closed saulalbert closed 6 years ago

saulalbert commented 6 years ago

Latching (i.e. speaker transition without any discernible silence) is marked in CHAT with the '≈' unicode symbol.

The problem is that WITHIN-speaker latching (i.e. when a speaker rushes to start the next utterance) makes the latched-together words unreadable.

So e.g.:

hello≈how are you

Is not parseable by a lexical search for 'hello' or 'how'.

So CAlite2CHAT will add spaces around the '≈' marks, and CHAT2CAlite should remove them.

So e.g.:

hello ≈ how are you

should be converted to

hello≈how are you