usnistgov / SCTK

Other
208 stars 52 forks source link

Compare two text files? #14

Closed dlcrista closed 2 years ago

dlcrista commented 5 years ago

Can I use sclite to compare two txt files?

I'm not comparing transcripts, just two plain text files

jfiscus commented 5 years ago

Not as a text file. You could turn your text files into a .trn formated file to accomplish this. Essentially, put the text all on a single line with an utterance id at the end.

TristanKnot commented 4 years ago

@jfiscus so you would have something like this in your text file, right?

EH2 K S K L AH0 M EY1 SH AH0 N P OY2 N T (abc000)

K L OW1 Z K W OW1 T (abc001)

D AH1 B AH0 L K W OW1 T (abc002)

Follow up question, sclite match based on ID, right? Meaning there is no need to have the hyp and ref files aligned in order of utterances.

sishtiaq commented 4 years ago

Say you have:

$ cat abc000
EH2 K S K L AH0 M EY1 SH AH0 N P OY2 N T
$ cat abc001
K L OW1 Z K W OW1 T
$ cat abc002
D AH1 B AH0 L K W OW1 T

Then you can create a trn file which has the contents you've listed:

$ cat all.trn
EH2 K S K L AH0 M EY1 SH AH0 N P OY2 N T (abc000)
D AH1 B AH0 L K W OW1 T (abc002)
K L OW1 Z K W OW1 T (abc001)

The records in all.trn don't have to be in any order, as sclite will match by the abc00? id.

jfiscus commented 4 years ago

Trn formats has a speaker and utterance id in the parens. Do this to make three utterances for speaker abc. order does not matter.

$ cat all.trn EH2 K S K L AH0 M EY1 SH AH0 N P OY2 N T (abc-000) D AH1 B AH0 L K W OW1 T (abc-002) K L OW1 Z K W OW1 T (abc-001)