Open ybracke opened 7 months ago
Concerns file(s): src/transnormer/evaluation/align_levenshtein.py
Note: Perhaps this update should be done in the original package instead of here
Adjust align functions so that tokens of a certain type are excluded from the alignment, e.g. if they only contain punctuation symbols.
align
Desired behavior:
>>>regex = r"..." # should be a regex that matches strings that only contain punctuation >>>align(['Sie bekommen ferner --'], ['bekommen ferner an —'], exclude=regex) >>>[ [ ("Sie", "░", 4), ("bekommen", "bekommen", 0), ("ferner", "ferner▁an", 3.5999999999999996), # not here: ("--", "—", 2) ], ]
Concerns file(s): src/transnormer/evaluation/align_levenshtein.py
Note: Perhaps this update should be done in the original package instead of here
Adjust
align
functions so that tokens of a certain type are excluded from the alignment, e.g. if they only contain punctuation symbols.Desired behavior: