kino-mc / rsmt2

A generic library to interact with SMT-LIB 2 compliant solvers running in a separate system process, such as Z3 and CVC4.
Apache License 2.0
65 stars 14 forks source link

Rewrite the low-level SMT parser #35

Open AdrienChampion opened 2 years ago

AdrienChampion commented 2 years ago

SmtParser uses a handmade parser that is showing its limits more and more (#33)

Would love to have something cleaner, though it would need to be as efficient as the current parser. Needs investigation (peg, nom, ?) and some benchmarking.

filippodebortoli commented 1 year ago

A PEG for SMT-LIB 2.6 (syntax up to Section 3.5 of the standard document) in pest could look like this:

WHITESPACE = _{ "\t" | "\n" | "\r" | " " }
COMMENT = _{ ";" ~ ANY* ~ NEWLINE }

printable_char = @{ '\x20'..'\x7e' | '\x80'..'\x8f' }
numeral = @{ "0" | ('1'..'9') ~ ASCII_DIGIT* }
decimal = @{ numeral ~ "." ~ "0"* ~ numeral }
hexadecimal = @{ "#x" ~ ASCII_HEX_DIGIT+ }
binary = @{ "#b" ~ ('0'..'1')+}
escaped_double_quote = _{ "\"\"" }
string = @{ "\"" ~ (printable_char | WHITESPACE | escaped_double_quote)* "\"" }
reserved_word = @{ "BINARY" | "DECIMAL" | "HEXADECIMAL" | "NUMERAL" | "STRING" | "_" | "!" | "as" | "let" | "exists" | "forall" | "match" | "par" }
char_list = { "~" | "!" | "@" | "$" | "%" | "^" | "&" | "*" | "_" | "-" | "+" | "=" | "<" | ">" | "." | "?" | "/"}
sym_simple = @{ (ASCII_ALPHA | char_list) ~ (ASCII_ALPHANUMERIC | char_list)* }
sym_quoted = @{ "|" ~ (!("\\" | "|") ~ (WHITESPACE | printable_char))* ~ "|"}
symbol = { sym_simple | sym_quoted }
keyword = {":" ~ sym_simple }
spec_constant = _{ numeral | decimal | hexadecimal | binary | string }
s_expr = @{ spec_constant | symbol | reserved_word | keyword | "(" ~ s_expr* ~ ")" }
index = { numeral | symbol }
identifier = { symbol | "(_ " ~ symbol ~ " " ~ index+ ")" }
attribute_value = @{ spec_constant | symbol | "(" ~ s_expr* ~ ")" }
attribute = @{ keyword | keyword ~ " " ~ attribute_value}
sort = @{ identifier | "(" ~ identifier ~ " " ~ sort+ ")" }

The next steps would be to encode syntax for: Terms and formulas, theory declarations, logic declarations, scripts. While writing the grammar file could be somewhat easy, I'd expect the task of rewriting the parsing functions to be cumbersome.

AdrienChampion commented 1 year ago

@filippodebortoli are you confident that peg would be as efficient as a handmade parser?

Generally one ignores such concerns because printing/parsing time should be (very) small compared to solving time. The issue is that efficient verification tools go to great length to issue checks that are as small as possible, in some cases extremely small.

Usually, this means that the tool issues a lot of small checks as opposed to one/a few big one(s). In this context I think parsing time can be a concern, but that's just my opinion 🐱

filippodebortoli commented 1 year ago

@filippodebortoli are you confident that peg would be as efficient as a handmade parser?

I am not confident enough to claim that it would yield a parser that competes with or outperforms the current, handmade one. I have the feeling that it would simplify a lot the parser's implementation and make it easier to maintain. There is one clear drawback: the parser provided by pest only accepts str as input, which is far more restrictive than what SmtParser currently allows for.

AdrienChampion commented 1 year ago

I wrote this a looonnng time ago, I need to take a look but right now I lean more towards keeping the handwritten parser, especially since it's quite small. I don't remember what the parser's code looks like but I expect that its maintainability can be improved greatly :D