Open Mingun opened 4 years ago
The main documentation for the expression language is in the user guide. It's not a full formal specification (there's no syntax grammar for example), but it's quite detailed and explains almost every feature of the expression language.
I inferred a formal PEG specification from Scala parser, but I don't found, how whitespaces is handled? Corresponding rule in parser commented, but spaces accepted by parser. How it does that?
That specification follow syntax of my fork of pegjs project (its used a range syntax, that is missing in original project).
It slightly differ from Scala parser for greater visibility.
// Entry point for
// - `size`, `if`, `parent`, `value`, `pos`, `io`, `repeat-expr`, `repeat-until`, `switch-on`, `process: xor|zlib|rol|ror`
// - `type.cases` keys (O_o)
// - `min`, `max`, `expr` in new `valid` key
topExpr = expr EOF;
// Entry point for `process: custom(arg1, arg2, ...)` and user parametrized types (with `params`) keys.
// Parses arguments of function/user type
topExprList = expr|1.., ","| EOF;
// Whitespaces
//_ = ([ \n]+ / "\\\n")*
EOF = !.;
string
= "'" (!"'")* "'"
/ '"' ([^\\"]* / escaped)* '"'
;
escaped = "\\" (quotedChar / quotedOct / quotedHex);
quotedChar = [abtnvfre'"\\];// characters that can be escaped by backslash
quotedOct = oct+;
quotedHex = "u" hex|4|;
digit = [0-9];
integer
= [1-9] (digit / "_")*
/ "0" [oO] oct+
/ "0" [xX] hex+
/ "0" [bB] bin+
/ "0"
;
oct = "_" / [0-7];
bin = "_" / [01];
hex = "_" / digit / [a-fA-F];
float
= digit+ exponent // Ex.: 4E2, 4E+2, 4e-2
/ fixed exponent? // Ex.: 4.E2, .4e+2, 4.2e-0
;
fixed
= digit* "." digit+ // Ex.: 4.2, .42
/ digit+ "." // Ex.: 42.
;
exponent = [eE] [+-]? digit+;
//-------------------------------------------------------------------------------------------------
name = nameStart namePart*;
nameStart = [a-zA-Z_];
namePart = nameStart / digit;
typeName = "::"? name|1.., "::"| ("[" "]")?;// Ex.: xyz, ::abc::def, array[]
enumName = "::"? name|2.., "::"|; // Ex.: enum::value, ::root::type::enum::value
//-------------------------------------------------------------------------------------------------
OR = "or" !namePart;
AND = "and" !namePart;
NOT = "not" !namePart;
expr = or_test ("?" expr ":" expr)?
or_test = and_test|1.., OR |;
and_test = not_test|1.., AND|;
not_test
= NOT not_test
/ or_expr (comp_op or_expr)?
;
comp_op
= "=="
/ "!="
/ "<>"
/ "<="
/ ">="
/ "<"
/ ">"
;
or_expr = xor_expr |1.., "|" |;
xor_expr = and_expr |1.., "^" |;
and_expr = shift_expr|1.., "&" |;
shift_expr = arith_expr|1.., ("<<" / ">>")|;
arith_expr = term |1.., [+-] |;
term = factor |1.., [*/%] |;
factor
= "+" factor
/ "-" factor
/ "~" factor // bitwise negation
/ atom postfix*
;
atom
= "(" expr ")"
/ "[" list? "]"
/ "sizeof" "<" typeName ">"
/ "bitsizeof" "<" typeName ">"
/ enumName
/ name
/ string+ // miltiply strings concatenated
/ float
/ integer
;
postfix
= "(" args ")" // call
/ "[" expr "]" // indexing
/ "." "as" "<" typeName ">" // type cast
/ "." name // attribute access
;
list = expr|1.., ","| ","?;
args = expr| .., ","|;
I agree that the only formal specification available now is reference compiler source code, which is obviously bad. Kudos for this effort with transcribing this to PEG!
For whitespace, as far as I can tell, it is handled by magic in FastParse: https://www.lihaoyi.com/fastparse/#WhitespaceHandling
Essentially, it injects the possibility to have a whitespace between every two consecutive literals. This "whitespace" also consumes Python-style comments, if my memory serves.
I do not see it on https://kaitai.io. I'm correctly understant, that expression parser is defined in https://github.com/kaitai-io/kaitai_struct_compiler/blob/0f32f3734dad0039dffb2275d38612eb779689ec/shared/src/main/scala/io/kaitai/struct/exprlang/Expressions.scala?