saulalbert / unixclan

Utility scripts for TalkBank's CLAN
0 stars 0 forks source link

CAlite2CHAT adds spaces around mid-TCU '≈' #34

Closed saulalbert closed 6 years ago

saulalbert commented 6 years ago

Latching (i.e. speaker transition without any discernible silence) is marked in CAlite with the '≈' unicode symbol.

The problem is that WITHIN-speaker latching (i.e. when a speaker rushes to start the next utterance) makes the latched-together words unreadable.

So e.g.:

hello≈how are you

Is not parseable by a lexical search for 'hello' or 'how'.

So CAlite2CHAT will add spaces around the '≈' marks, and CHAT2CAlite should remove them.

So e.g.:

hello≈how are you

should be converted to

hello ≈ how are you