Closed roycewilliams closed 1 year ago
The only "problem" with this simple/naive method is that if one of the bytes in a multibyte sequence just happens to be valid printable ASCII, then it will just use that character instead of escaping it, which may confuse readability a bit in the generated rule. But I thought it best to keep it simple, and not try to do encoding validation etc etc. And because "bytes are just bytes", the resulting byte insertion results should be the same.
If this approach is accepted, I will also apply the equivalent changes to tmesis-dynamic.pl .
Prompted by question from @stealthsploit.
Edit: I tested other comparison methods for efficiency. it's slightly (~3-5%) faster to just compare characters directly, without ~= and without conversion:
if ($word_buf[$word_pos] lt ' ' or $word_buf[$word_pos] gt '~')
While here, also clean up some non-ASCII in the source itself.