revdotcom / fstalign

An efficient OpenFST-based tool for calculating WER and aligning two transcript sequences.
Apache License 2.0
157 stars 8 forks source link

adding basic fast-leveinstein distance computation #13

Closed jprobichaud closed 2 years ago

jprobichaud commented 3 years ago

This is a first attempt at speeding up the the adapted composition by adding a first Levenstein edit-distance computation. This consumes RAM proportional to m*n (number of words in both ref and hyp) so it could come problematic.

The speedup is noticeable, I've seen 10x in some cases.