Extends the functionality of the --glossaries argument of apply_bpe.py to enable using regex.
For example:
python apply_bpe.py -c codes_file -i input_text --glossaries "string1" "string2" "<tag>\w*</tag>" "\d+"
will ensure string1, string2, words enclosed in <tag> </tag> and numbers will not be split by BPE and will be isolated from other subwords.
Extends the functionality of the --glossaries argument of apply_bpe.py to enable using regex.
For example:
python apply_bpe.py -c codes_file -i input_text --glossaries "string1" "string2" "<tag>\w*</tag>" "\d+"
will ensure string1, string2, words enclosed in<tag> </tag>
and numbers will not be split by BPE and will be isolated from other subwords.Helps with #49.