Open hudlow opened 5 days ago
The guidelines are there to let users know roughly what is reasonable to expect for CEL parser implementations. It's not intended to be exact, or even a limit, just guidance as to the minimum support.
The actual parse complexity is limited by a mix of the grammar and the parser/generator CEL uses. The runtime complexity and what's supported there are technically unbounded, though there are some practical limits which users may encounter. There are also implicit limits on things like the number of errors reported, the lookahead-depth in error flows, and the amount of recursion supported in total. The documents cover roughly what all parsers minimally support though it is possible to find some cases that will trip the parse recursion limits.
If there's a way to clear this up in the docs, I'd be happy to, but it seemed more informative to give users a sense of relative recursion cost per grammar fragment in terms of the operators they might write. As you noted, the possible combinations of what's supported would be enormous.
As in other issues, I am looking for a way to ensure that the specific limitations of a CEL implementation do not leak to the end users authoring expressions. I can experiment with how parsing complexity might be more universally defined. It does seem like a tricky problem.
~@TristonianJones Here's perhaps a less ambiguous definition of parsing complexity:~
A
levels of nested expressions (Expr
);~B
members (Member
) in an expression (Expr
);~C
levels of access (_._
or _[_]
) in a member (Member
);~D
items in a list (ExprList
, FieldInits
, or MapInits
).~~The hope would be that these numbers would bound parsing to something like O((B*C*D)^A)
but I'm probably making some mistakes.~
Edit: but this isn't yet useful. Even if I pick VERY conservative numbers, A=8, B=16, C=8, D=16
, it's O(3e26)
, which multiplied by a 3GHz clock cycle is 3.3 billion years. Back to the drawing board!
@TristonianJones is the most salient practical limit for implementations the 100-deep limit for proto unmarshalling?
I've been trying to work out what exactly this section means:
Here are a few questions:
1 + (2 + 3)
two or three consecutive binary arithmetic operators of the same precedence?||
, each of which has 32 terms of&&
, each of which has a function call with 32 arguments, etc.? Thei.e.
(vs ane.g.
) suggests that these are truly independent rules and so an expression with 32 × 32 × 32 × 32 × 32 × 24 × 24 × 24 = 4.639×10¹¹ terms is required to be supported. Moreover, it's unclear if an implementation must support as many non-consecutive repetitions of such terms as can be constructed... for example, if an implementation must support 33||
terms as long as they are somehow broken up by another operation.cel-go
implementation, it seems like there is a total limit of 32 non-homogenous recursive calls. Again, the use ofi.e.
instead ofe.g.
implies that 11 nested function calls inside of 11 nested lists, for example, would be required to be supported by an implementation. As above, it's unclear if that also means that, for example, an implementation is require to support 13 non-consecutively nested function calls.There are more mysteries revealed in the
cel-go
implementation. For example, it considers the below example legal until you add a parenthetical 18th term to the existing inner-most parenthetical (shown as a comment), and then it says it violates a max recursion depth, but it's unclear what recursion depth that is, since there is no clear recursion of exactly 12 or 24 or 32 levels. Also, the outer-most expression supports 33 (but not 34) terms which itself seems weird!