Minimum supported grammar complexity is unclear

hudlow commented 5 days ago

I've been trying to work out what exactly this section means:

Implementations are required to support at least:

24-32 repetitions of repeating rules, i.e.:

32 terms separated by || in a row;

32 terms separated by && in a row;

32 function call arguments;

list literals with 32 elements;

map or message literals with 32 fields;

24 consecutive ternary conditionals ?:;

24 binary arithmetic operators of the same precedence in a row;

24 relations in a row.

12 repetitions of recursive rules, i.e:

12 nested function calls;

12 selection (.) operators in a row;

12 indexing ([_]) operators in a row;

12 nested list, map, or message literals.

Here are a few questions:

How do parentheticals interact with these limits? Is 1 + (2 + 3) two or three consecutive binary arithmetic operators of the same precedence?
Is there any interaction between the rules in the first list? Must an implementation support 32 terms of ||, each of which has 32 terms of &&, each of which has a function call with 32 arguments, etc.? The i.e. (vs an e.g.) suggests that these are truly independent rules and so an expression with 32 × 32 × 32 × 32 × 32 × 24 × 24 × 24 = 4.639×10¹¹ terms is required to be supported. Moreover, it's unclear if an implementation must support as many non-consecutive repetitions of such terms as can be constructed... for example, if an implementation must support 33 || terms as long as they are somehow broken up by another operation.
Is there any interaction between the rules in the second list? Playing with the cel-go implementation, it seems like there is a total limit of 32 non-homogenous recursive calls. Again, the use of i.e. instead of e.g. implies that 11 nested function calls inside of 11 nested lists, for example, would be required to be supported by an implementation. As above, it's unclear if that also means that, for example, an implementation is require to support 13 non-consecutively nested function calls.

There are more mysteries revealed in the cel-go implementation. For example, it considers the below example legal until you add a parenthetical 18th term to the existing inner-most parenthetical (shown as a comment), and then it says it violates a max recursion depth, but it's unclear what recursion depth that is, since there is no clear recursion of exactly 12 or 24 or 32 levels. Also, the outer-most expression supports 33 (but not 34) terms which itself seems weird!

1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 + 11 + 12 + 13 + 14 + 15 + 16 + 17 + 18 + 19 + 20 + 21 + 22 + 23 + 24 + 25 + 26 + 27 + 28 + 29 + 30 + 31 + 32 +
(
  1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 + 11 + 12 + 13 + 14 + 15 + 16 + 17 + 18 + 19 + 20 + 21 + 22 + 23 + 24 + 25 + 26 + 27 + 28 + 29 + 30 + 31 +
  (
    1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 + 11 + 12 + 13 + 14 + 15 + 16 + 17 + 18 + 19 + 20 + 21 + 22 + 23 + 24 + 25 + 26 + 27 + 28 + 29 + 30 +
    (
      1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 + 11 + 12 + 13 + 14 + 15 + 16 + 17 + 18 + 19 + 20 + 21 + 22 + 23 + 24 + 25 + 26 + 27 + 28 + 29 +
      (
        1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 + 11 + 12 + 13 + 14 + 15 + 16 + 17 + 18 + 19 + 20 + 21 + 22 + 23 + 24 + 25 + 26 + 27 + 28 +
        (
          1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 + 11 + 12 + 13 + 14 + 15 + 16 + 17 + 18 + 19 + 20 + 21 + 22 + 23 + 24 + 25 + 26 + 27 +
          (
            1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 + 11 + 12 + 13 + 14 + 15 + 16 + 17 + 18 + 19 + 20 + 21 + 22 + 23 + 24 + 25 + 26 +
            (
              1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 + 11 + 12 + 13 + 14 + 15 + 16 + 17 + 18 + 19 + 20 + 21 + 22 + 23 + 24 + 25 +
              (
                1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 + 11 + 12 + 13 + 14 + 15 + 16 + 17 + 18 + 19 + 20 + 21 + 22 + 23 + 24 +
                (
                  1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 + 11 + 12 + 13 + 14 + 15 + 16 + 17 + 18 + 19 + 20 + 21 + 22 + 23 +
                  (
                    1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 + 11 + 12 + 13 + 14 + 15 + 16 + 17 + 18 + 19 + 20 + 21 + 22 +
                    (
                      1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 + 11 + 12 + 13 + 14 + 15 + 16 + 17 + 18 + 19 + 20 + 21 +
                      (
                        1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 + 11 + 12 + 13 + 14 + 15 + 16 + 17 + 18 + 19 + 20 +
                        (
                          1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 + 11 + 12 + 13 + 14 + 15 + 16 + 17 + 18 + 19 +
                          (
                            1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 + 11 + 12 + 13 + 14 + 15 + 16 + 17 + 18 +
                            (
                              1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 + 11 + 12 + 13 + 14 + 15 + 16 + 17 // + (18)
                            )
                          )
                        )
                      )
                    )
                  )
                )
              )
            )
          )
        )
      )
    )
  )
)

TristonianJones commented 4 days ago

The guidelines are there to let users know roughly what is reasonable to expect for CEL parser implementations. It's not intended to be exact, or even a limit, just guidance as to the minimum support.

The actual parse complexity is limited by a mix of the grammar and the parser/generator CEL uses. The runtime complexity and what's supported there are technically unbounded, though there are some practical limits which users may encounter. There are also implicit limits on things like the number of errors reported, the lookahead-depth in error flows, and the amount of recursion supported in total. The documents cover roughly what all parsers minimally support though it is possible to find some cases that will trip the parse recursion limits.

If there's a way to clear this up in the docs, I'd be happy to, but it seemed more informative to give users a sense of relative recursion cost per grammar fragment in terms of the operators they might write. As you noted, the possible combinations of what's supported would be enormous.

hudlow commented 4 days ago

As in other issues, I am looking for a way to ensure that the specific limitations of a CEL implementation do not leak to the end users authoring expressions. I can experiment with how parsing complexity might be more universally defined. It does seem like a tricky problem.

hudlow commented 3 days ago

~@TristonianJones Here's perhaps a less ambiguous definition of parsing complexity:~

~A levels of nested expressions (Expr);~
~B members (Member) in an expression (Expr);~
~C levels of access (_._ or _[_]) in a member (Member);~
~D items in a list (ExprList, FieldInits, or MapInits).~

~The hope would be that these numbers would bound parsing to something like O((B*C*D)^A) but I'm probably making some mistakes.~

Edit: but this isn't yet useful. Even if I pick VERY conservative numbers, A=8, B=16, C=8, D=16, it's O(3e26), which multiplied by a 3GHz clock cycle is 3.3 billion years. Back to the drawing board!

hudlow commented 3 days ago

@TristonianJones is the most salient practical limit for implementations the 100-deep limit for proto unmarshalling?

google / cel-spec

Minimum supported grammar complexity is unclear #410