ivrit-ai / ivrit.ai

ivrit.ai codebase
MIT License
24 stars 9 forks source link

Better evaluation #33

Open MetroCat69 opened 5 months ago

MetroCat69 commented 5 months ago

we can use the jiwer library to improve the evaluation by removing multiple spaces and we can also add character error rate to the evaluation

example:

transformation = jiwer.Compose([ jiwer.RemovePunctuation(), jiwer.RemoveEmptyStrings(), jiwer.ToLowerCase(), jiwer.RemoveWhiteSpace(replace_by_space=True), jiwer.RemoveMultipleSpaces(), jiwer.Strip(), jiwer.ReduceToListOfListOfWords(word_delimiter=" ") ])

def wer(ground_truth, hypothesis): return jiwer.wer( ground_truth, hypothesis, truth_transform=transformation, hypothesis_transform=transformation )

def cer(ground_truth, hypothesis): return jiwer.cer( ground_truth.replace('.', '').replace(',', '').replace('?', '').replace('-', '').replace('\"', ''), hypothesis.replace('.', '').replace(',', '').replace('?', '').replace('-', '').replace('\"', ''),     )

yanirmr commented 1 month ago

@MetroCat69 Could you give more context please? what is it about?

@yairl Do you know what this is about?