Closed ratmice closed 2 years ago
This is good stuff! Some of these tested my grey matter a bit!
-1
is also incorrect a multibyte case? I wonder if we can trigger that?parse_actions
should start with debug_assert!(self.lookahead_is("{", i).is_some())
because that invariant is necessary for correctness. Rather than j += 1
I would tend to prefer changing the last two lines of the function to something like:
debug_assert(self.lookahead_is("}", j).is_some());
let s = self.src[i + '{'.len_utf8()..j].trim().to_string();
Ok((j + '}'.len_utf8(), s))
- fcafb44 feels like the
-1
is also incorrect a multibyte case? I wonder if we can trigger that?
There seems to be at least 2 cases we could trigger it on, most declarations are parsing token
, name
, or int
, and are expecting an ascii subset for those and failing otherwise with some other error.
But %parse-param
, and %actiontype
although it won't generate valid code, do accept them, by reading until eol or eof.
The curious thing is I anticipated this to fail but it doesn't seem to. I'll have to look into why another day however.
Edit: If I had to guess, a bad span in the middle of a multibyte character is going right through the NewLineCache
as the result of a comparison operator or something to that effect.
Ah, I think I know why the multibyte case doesn't trigger: all we're doing is returning a byte offset to the user, but unless they try indexing off of that, no-one will notice that there's a problem. On that basis, I think the saturating_sub
is only a partial fix. Some possibilities:
PrematureEnd
doesn't carry an index with it. After all it just means "the file finished without enough stuff for the grammar to make sense" and i
is always self.src.len()
.PrematureEnd
finishes with self.src.len()
.PrematureEnd
pinpoints the last but one character as the error (starting from the beginning of the input.Now that I've written it out, either #1 or #2 seem like the best options to me. Any thoughts?
Yeah, the 'unexpectedly succeeds' part is mostly that it returns the column
which I would expect.
(At that point it hadn't occurred to me that the span is bad but the column could still be correct), so I agree it is a partial fix.
Before I noticed the testcase was working, I was trying option 3 since it maps to the current behavior, But I'm fine with option #2 preferrably over #1, since it keeps the invariant that all errors have at least 1 span, and since we made the kinds private it would be difficult to special case it.
// this could also just use char_indices instead?
let last_pos = self.src.chars().last().map_or_else(|| 0, |c| i - c.len_utf8());
Err(self.mk_error(YaccGrammarErrorKind::PrematureEnd, last_pos))
Option 2 works for me -- let's do that!
Please squash.
Squashed.
bors r+
Build succeeded:
@ltratt FWIW, After this I was able to succesfully test all the YaccKind
s in AFL program without crashing for an hour or so, I'll keep running it until it stops find new branches via coverage... Additionally I modified it to also look for invalid spans, and checked that the PrematureEnd multibyte case was found by it, I should probably add something like that to the testsuite helpers to ensure it doesn't happen in the future.
So even though this patch was using only cases found by YaccKind::Grmtools
, I haven't encountered any issues the additional kinds.
#[macro_use]
extern crate afl;
use cfgrammar::yacc::*;
fn check_spans(e: Result<YaccGrammar, Vec<YaccGrammarError>>, src: &str) {
if let Err(es) = e {
for e in es {
for span in e.spans() {
let _ = &src[span.start()..span.end()];
}
}
}
}
fn main() {
fuzz!(|data: &[u8]| {
if let Ok(s) = std::str::from_utf8(data) {
let kinds = [
YaccKind::Grmtools,
YaccKind::Original(YaccOriginalActionKind::NoAction),
YaccKind::Original(YaccOriginalActionKind::GenericParseTree),
YaccKind::Original(YaccOriginalActionKind::UserAction),
YaccKind::Eco,
];
for kind in kinds {
check_spans(YaccGrammar::new(kind, s), s);
}
}
});
}
That sounds good -- as does adding additional test cases! Thanks for doing this!
This fixes yacc panics some were caught by fuzzing, the others move from expecting a panic during
unwrap()
to expecting error kinds.The action parsing https://github.com/softdevteam/grmtools/commit/fcafb448335bb51f58faed35349522925d8ea67d, I believe that as it was represents a real bug, because the slicing was actually dropping the last character of the action (or the last byte of a multibyte character) rather than a trailing
}
character.But it seems like a real possibility that I am misunderstanding something about the intended behavior loop. And I'm not terribly familiar with action parsing, as I tend to mostly use the
NoActions
Kind.