ratmice commented 2 years ago

This fixes yacc panics some were caught by fuzzing, the others move from expecting a panic during unwrap() to expecting error kinds.

The action parsing https://github.com/softdevteam/grmtools/commit/fcafb448335bb51f58faed35349522925d8ea67d, I believe that as it was represents a real bug, because the slicing was actually dropping the last character of the action (or the last byte of a multibyte character) rather than a trailing } character.

But it seems like a real possibility that I am misunderstanding something about the intended behavior loop. And I'm not terribly familiar with action parsing, as I tend to mostly use the NoActions Kind.

ltratt commented 2 years ago

This is good stuff! Some of these tested my grey matter a bit!

https://github.com/softdevteam/grmtools/pull/337/commits/fcafb448335bb51f58faed35349522925d8ea67d feels like the -1 is also incorrect a multibyte case? I wonder if we can trigger that?
https://github.com/softdevteam/grmtools/pull/337/commits/ed48aff48677b35aece2ed2e90b49494d961d25 looks correct as-is.
https://github.com/softdevteam/grmtools/pull/337/commits/01e01b56b977a8ce4ca1ec30fadc02ee918d7221 looks correct as-is.
https://github.com/softdevteam/grmtools/pull/337/commits/6dc74c981436e0d16c3ac946ae97696b649581d6 I think really parse_actions should start with debug_assert!(self.lookahead_is("{", i).is_some()) because that invariant is necessary for correctness. Rather than j += 1 I would tend to prefer changing the last two lines of the function to something like:
```
        debug_assert(self.lookahead_is("}", j).is_some());
        let s = self.src[i + '{'.len_utf8()..j].trim().to_string();
        Ok((j + '}'.len_utf8(), s))
```

ratmice commented 2 years ago

fcafb44 feels like the -1 is also incorrect a multibyte case? I wonder if we can trigger that?

There seems to be at least 2 cases we could trigger it on, most declarations are parsing token, name, or int, and are expecting an ascii subset for those and failing otherwise with some other error.

But %parse-param, and %actiontype although it won't generate valid code, do accept them, by reading until eol or eof. The curious thing is I anticipated this to fail but it doesn't seem to. I'll have to look into why another day however.

Edit: If I had to guess, a bad span in the middle of a multibyte character is going right through the NewLineCache as the result of a comparison operator or something to that effect.

ltratt commented 2 years ago

Ah, I think I know why the multibyte case doesn't trigger: all we're doing is returning a byte offset to the user, but unless they try indexing off of that, no-one will notice that there's a problem. On that basis, I think the saturating_sub is only a partial fix. Some possibilities:

PrematureEnd doesn't carry an index with it. After all it just means "the file finished without enough stuff for the grammar to make sense" and i is always self.src.len().
PrematureEnd finishes with self.src.len().
PrematureEnd pinpoints the last but one character as the error (starting from the beginning of the input.

Now that I've written it out, either #1 or #2 seem like the best options to me. Any thoughts?

ratmice commented 2 years ago

Yeah, the 'unexpectedly succeeds' part is mostly that it returns the column which I would expect. (At that point it hadn't occurred to me that the span is bad but the column could still be correct), so I agree it is a partial fix.

Before I noticed the testcase was working, I was trying option 3 since it maps to the current behavior, But I'm fine with option #2 preferrably over #1, since it keeps the invariant that all errors have at least 1 span, and since we made the kinds private it would be difficult to special case it.

// this could also just use char_indices instead?
let last_pos = self.src.chars().last().map_or_else(|| 0, |c| i - c.len_utf8());
Err(self.mk_error(YaccGrammarErrorKind::PrematureEnd, last_pos))

2 would be easy to implement though just by removing the sub.

ltratt commented 2 years ago

Option 2 works for me -- let's do that!

ratmice commented 2 years ago

Should be fixed in 1ab18f6, and e738de1 (sorry)

I'll have to see if I can't trigger the same multi-byte issue for lrlex in the morning.

ltratt commented 2 years ago

Please squash.

ratmice commented 2 years ago

Squashed.

ltratt commented 2 years ago

bors r+

bors[bot] commented 2 years ago

Build succeeded:

buildbot/buildbot-build-script

ratmice commented 2 years ago

@ltratt FWIW, After this I was able to succesfully test all the YaccKinds in AFL program without crashing for an hour or so, I'll keep running it until it stops find new branches via coverage... Additionally I modified it to also look for invalid spans, and checked that the PrematureEnd multibyte case was found by it, I should probably add something like that to the testsuite helpers to ensure it doesn't happen in the future.

So even though this patch was using only cases found by YaccKind::Grmtools, I haven't encountered any issues the additional kinds.

#[macro_use]
extern crate afl;
use cfgrammar::yacc::*;

fn check_spans(e: Result<YaccGrammar, Vec<YaccGrammarError>>, src: &str) {
    if let Err(es) = e {
        for e in es {
            for span in e.spans() {
                let _ = &src[span.start()..span.end()];
            }
        }
    }
}

fn main() {
    fuzz!(|data: &[u8]| {
        if let Ok(s) = std::str::from_utf8(data) {            
            let kinds = [
                YaccKind::Grmtools,
                YaccKind::Original(YaccOriginalActionKind::NoAction),
                YaccKind::Original(YaccOriginalActionKind::GenericParseTree),
                YaccKind::Original(YaccOriginalActionKind::UserAction),
                YaccKind::Eco,
            ];
            for kind in kinds {
                check_spans(YaccGrammar::new(kind, s), s);
            }
        }
    });
}

ltratt commented 2 years ago

That sounds good -- as does adding additional test cases! Thanks for doing this!

softdevteam / grmtools

Fix yacc panics with YaccKind::Grmtools #337

2 would be easy to implement though just by removing the sub.