Closed Manishearth closed 3 years ago
cc @sunfishcode I'm trying to figure out what's wrong here but I'm not sure I've gotten it yet.
I suspect the fuzz test is incorrect: It's testing an invariant that only makes sense for strings that have an actual base character in them.
\u{11347}
and \u{11357}
are both starters (canonical combining class of 0), but they also compose with each other (into \u{1134c}
), which is what makes this case different from common cases. So it does appear to be a bug in the fuzz test. I'll look into whether the fuzz test can be fixed.
If I can't find a way to fix it, plan b is to replace the fuzz test with one which does this:
let mut count = 0;
for c in stream_safe.chars().stream_safe().nfkd() {
if canonical_combining_class(c) != 0 {
count += 1;
assert!(count <= MAX_NONSTARTERS);
} else {
count = 0;
}
}
That doesn't test for the streaming property I originally wrote the fuzzer for, but does test that Stream-Safe basically works, without relying on any tricky details about how the inner iterators work.
I've now opened #77, which fixes the fuzz test to not over-count when there are multiple starters.
Found by oss-fuzz in: https://github.com/unicode-rs/unicode-normalization/blob/master/fuzz/fuzz_targets/streaming.rs
Testcase:
𑍌ؙؙؙؙؙؙؙؙؙؙؙؙؙؙؙؙؙؙؙؙؙؙؙؙؙؙؙؙؔٙ
\u{11347}\u{11357}\u{00614}\u{00619}\u{00619}\u{00619}\u{00619}\u{00619}\u{00619}\u{00619}\u{00619}\u{00619}\u{00619}\u{00619}\u{00619}\u{00619}\u{00619}\u{00619}\u{00659}\u{00619}\u{00619}\u{00619}\u{00619}\u{00619}\u{00619}\u{00619}\u{00619}\u{00619}\u{00619}\u{00619}\u{00619}\u{00619}