Right now, StringParser's implementation is at the character level, so if you give it a special token as the target string, it can possibly generate the same string but with non-special tokens. If a flag could be added that prevents the target string from being split, it would be very helpful. I can help write the PR, but I am not sure where exactly to get started..I see the comment:
It is a debugging / learning tool to show how CharacterLevelParser works together with TokenizerPrefixTree to filter the allowed tokens (some of whom may contain multiple characters)"""
The idea of LMFE is to support any sequence of tokens, whose string decoding is legal output. What you are requesting is essentially a violation of this. I'm not sure there's an elegant way to do this.
Right now,
StringParser
's implementation is at the character level, so if you give it a special token as the target string, it can possibly generate the same string but with non-special tokens. If a flag could be added that prevents the target string from being split, it would be very helpful. I can help write the PR, but I am not sure where exactly to get started..I see the comment:so I think it should be possible?