jflex-de / jflex

The fast scanner generator for Java™ with full Unicode support
http://jflex.de
Other
581 stars 114 forks source link

Fix char class normalisation for overlapping class content #1066

Closed lsf37 closed 1 year ago

lsf37 commented 1 year ago

In a negated character class that has overlapping content, such as [^\n\s], the normalisation code is violating a precondition of IntCharSet.sub() and leaves the class content in an inconsistent state. This either triggers an exception at generation time if another set operation interacts with the inconsistent part, or may lead to matching wrong input at runtime if nothing else interacts with the set.

This PR fixes the problem by first computing the union of the class content \n\s, which becomes a single set (joining the overlapping parts) and then computing the complement of that set.

Fixes #1065