saulalbert / unixclan

Utility scripts for TalkBank's CLAN
0 stars 0 forks source link

CHAT2CAlite should remove 'space + period' line terminators but not any others #20

Closed saulalbert closed 6 years ago

saulalbert commented 6 years ago

At the end of every line in a CHAT transcript theres an line terminator - it is usually a space followed by a period ..

Sometimes it is a space followed by a bang ! or a question mark ?.

The 'space + period' type are 'defaults', which we want to remove entirely, while removing only the space from the others and ignoring line terminators that are not preceded by a space

For example, ignore: hello how are you.

but remove the terminator from:

hello how are you .

So it should look like:

hello how are you

BUT, we should only remove the space from 'space + question mark' or 'space + bang' terminators. So...

hello how are you ? and hello how are you !

would become

hello how are you? and hello how are you!

mumair01 commented 6 years ago

Do I have to delete a period at the end of the line if it is following a symbol? (See line 22 of the example CHAT file).

saulalbert commented 6 years ago

Yes! As long as it’s a space followed by a period, we want it gone!

mumair01 commented 6 years ago

*PS006: in fact the whole family was together for <Mary's wedding> [>] →. The period is not after a space, but rather another symbol. In this case, we don't delete the period right? Because then we'd have to include a dictionary of symbols.

saulalbert commented 6 years ago

Oh, sorry - I’m not at the computer so I thought there was a space there. Indeed. If there’s no space, we leave it.