unicode-rs / unicode-segmentation

Grapheme Cluster and Word boundaries according to UAX#29 rules
https://unicode-rs.github.io/unicode-segmentation
Other
565 stars 57 forks source link

Some possible panics found by afl.rs #119

Open Koral77 opened 1 year ago

Koral77 commented 1 year ago

I have used afl.rs to fuzz all public API of thie crate. And I found several cases may cause panic. The code to replay these panics are as follows:

These 6 cases are about arithmetic overflow.

let mut _local0 = unicode_segmentation::GraphemeCursor::new(18446742978509668351 ,18446744073709551615 ,false);
let _ = unicode_segmentation::GraphemeCursor::is_boundary(&mut (_local0), "t\u{7f}", 18446744073709551615);
let mut _local0 = unicode_segmentation::GraphemeCursor::new(18446707789825836799 ,18446744073709551615 ,false);
unicode_segmentation::GraphemeCursor::provide_context(&mut (_local0), "1", 18446744073709551615);
let mut _local0 = unicode_segmentation::GraphemeCursor::new(5404402016221612875 ,5425481077020773195 ,false);
let _ = unicode_segmentation::GraphemeCursor::prev_boundary(&mut (_local0), "KKK", 5425512962414627659);
let mut _local0 = unicode_segmentation::GraphemeCursor::new(18446744073709551615 ,8502796096475496447 ,false);
let _ = unicode_segmentation::GraphemeCursor::is_boundary(&mut (_local0), "\u{6dd}", 18446744073709551615);
let mut _local0 = unicode_segmentation::GraphemeCursor::new(5208492444341520456 ,5208492444341520431 ,true);
let _ = unicode_segmentation::GraphemeCursor::next_boundary(&mut (_local0), "HHHHHHHHHHHHH", 5208492589950978632);
let mut _local0 = unicode_segmentation::GraphemeCursor::new(18446744073709551615 ,16212958658533785599 ,false);
let _ = unicode_segmentation::GraphemeCursor::next_boundary(&mut (_local0), "0", 18446744073709551615);

These 2 cases are about utf-8 error and panicked at 'byte index is not a char boundary'.

let mut _local0 = unicode_segmentation::GraphemeCursor::new(8463800222054970741 ,8463951407229173877 ,false);
let _ = unicode_segmentation::GraphemeCursor::is_boundary(&mut (_local0), "Ë", 8463800222054970740);
let mut _local0 = unicode_segmentation::GraphemeCursor::new(5208492444341520467 ,3407250190757808200 ,true);
let _ = unicode_segmentation::GraphemeCursor::prev_boundary(&mut (_local0), "HHHZ\\HHH\0\u{e040}HHK", 5208492444341520456);

These 2 cases are about unwrap error.

let mut _local0 = unicode_segmentation::GraphemeCursor::new(2 ,2 ,true);
unicode_segmentation::GraphemeCursor::provide_context(&mut (_local0), "l ", 1);
let mut _local0 = unicode_segmentation::GraphemeCursor::new(4268070197446523707 ,4268070196469563392 ,false);
let _ = unicode_segmentation::GraphemeCursor::next_boundary(&mut (_local0), "; ", 4268070197446523705);

This case is about out-of-bound error.

let mut _local0 = unicode_segmentation::GraphemeCursor::new(4268070197446523713 ,4268070196471726080 ,false);
let _ = unicode_segmentation::GraphemeCursor::next_boundary(&mut (_local0), "\n\n\n\n\n\n\n\n", 4268070197446522939);

The simple bug report of this case is image

I also placed the replay files at replay_files.

I hope you can check if these are real bugs need to be fixed. Thanks a lot.

Manishearth commented 1 year ago

Thanks! I don't have time to investigate this but would accept PRs for it!

In general the cursor APIs have a couple bugs

cardigan1008 commented 1 month ago

For arithmetic overflow panics, I think we just need to wrap them with safer functions like saturating_x to deal with these edge cases.

I'll try to work on it.