[QUESTION] Fuzzing as part of a parser generator?

rindPHI / isla

The ISLa (Input Specification Language) language & solver.

https://isla.readthedocs.org

GNU General Public License v3.0

62 stars 8 forks source link

[QUESTION] Fuzzing as part of a parser generator? #82

Closed Lotes closed 1 year ago

Lotes commented 1 year ago

I am just thinking loud, because I am not so deep in your topic.

Would it not be possible to have this fuzzing framework as part of a parser generator (like ANTLR4)? There you already have a grammar. Normally even more than the EBNF can provide:

RegExp definitions for lexical rules, like /\d+/ for numbers or Identifiers /[a-z][-z0-9]*/i
A concept of hidden tokens, that get filtered out, so the parser do not have to handle those
- from the fuzzing side you would insert these hidden things and run a formatter afterwards Would it be possible to have hidden rules and Regexp tokens in your framework as well? Is it hard to do?

I am just talking for myself, who is just very inspired by your tool. Haha

rindPHI commented 1 year ago

Hi @Lotes,

Using a fuzzer when you already have a grammar makes perfect sense! There exists an experimental converter from ANTLR grammar to the ISLa grammar format. However, this converter has yet to be brought into a shape suitable for public use. In any case, there is an ANTLR-based fuzzer, Grammarinator. This tool is not affiliated with ISLa/myself; it's under active maintenance, so it might make sense for you to try it out.

When writing "part of a parser generator," did you think of something else? Running a fuzzer for a grammar used for parser generation means that the parser generator and fuzzer are separate, but this sounds like a sensible combination of tools to me.

I'm closing this for now; please feel free to follow up or write me an email (dominic.steinhoefel (at) cispa.de) if you have any more questions.

Best, Dominic

Lotes commented 1 year ago

@rindPHI I was wondering how hidden tokens are handled. I can have a closer look to your link. Thanks.

My background is actually the project Langium, a kind of successor of Xtext, which offers more than just a parser. Having a fuzzer is just an idea, more nice-to-have, I would say.

rindPHI commented 1 year ago

Langium sounds interesting; I did not know it.

ISLa has no notion of hidden tokens that are removed when parsing but instantiated when fuzzing. One would have to work with two different grammars to that end. Regex tokens are also not supported; they must be converted to a grammar first.