Estimated size information lost when `.map()` result is chained

google / cel-go

Fast, portable, non-Turing complete expression evaluation with gradual typing (Go)

https://cel.dev

Apache License 2.0

2.3k stars 224 forks source link

Estimated size information lost when `.map()` result is chained #900

Closed jpbetz closed 9 months ago

jpbetz commented 9 months ago

Estimated size information lost with .map() when .filter() is called on the result (presumably applies to other chaining operations).

To Reproduce Check which components this affects:

[ ] parser
[x] checker
[ ] interpreter

Sample expression and input that reproduces the issue:

[1,2,3,4,5].map(x, x).filter(x, x % 2 == 0)

Test setup:

checker/cost_test.go:

        {
            name:   ".map.filter size",
            expr:   `[1,2,3,4,5].map(x, x).filter(x, x % 2 == 0)`,
            wanted: CostEstimate{Min: 97, Max: 97},
        },

Expected behavior

Expected a bounded cost estimate. got: [97, 18446744073709551615]

Additional context

Both these test cases pass when added:

        {
            name:   ".filter size",
            expr:   `[1,2,3,4,5].filter(x, x % 2 == 0)`,
            wanted: CostEstimate{Min: 41, Max: 101},
        },
        {
            name:   ".map size",
            expr:   `[1,2,3,4,5].map(x, x)`,
            wanted: CostEstimate{Min: 86, Max: 86},
        },

jpbetz commented 9 months ago

cc @cici37 @TristonianJones

TristonianJones commented 9 months ago

@jpbetz why do you expected unbounded cost on the upperbound when the input size is fixed?

stevekuznetsov commented 9 months ago

@TristonianJones I think it's the opposite - we expect a bounded cost, but the current implementation returns an unbounded one.

jpbetz commented 9 months ago

Joe Betz why do you expected unbounded cost on the upperbound when the input size is fixed?

I expected bounded but got unbounded.

stevekuznetsov commented 9 months ago

It seems like there's something more pernicious than adding .filter() to a .map() - adding the following two cases:

        {
            name: "list n squared",
            expr: `[1,2,3].all(i, i in [1,2,3])`,
            vars: []*decls.VariableDecl{
                decls.NewVariable("list1", types.NewListType(types.IntType)),
                decls.NewVariable("list2", types.NewListType(types.IntType)),
            },
            wanted: CostEstimate{Min: 5, Max: 5},
        },
        {
            name: "list n squared with map",
            expr: `[1,2,3].all(i, i in [1,2,3].map(j, j + j))`,
            vars: []*decls.VariableDecl{
                decls.NewVariable("list1", types.NewListType(types.IntType)),
                decls.NewVariable("list2", types.NewListType(types.IntType)),
            },
            wanted: CostEstimate{Min: 5, Max: 5},
        },

We get:

$ go test ./checker/... -run TestCost/list_n_squared
?       github.com/google/cel-go/checker/decls  [no test files]
--- FAIL: TestCost (0.00s)
    --- FAIL: TestCost/list_n_squared (0.00s)
        cost_test.go:566: Got cost interval [20, 62], wanted [5, 5]
    --- FAIL: TestCost/list_n_squared_with_map (0.00s)
        cost_test.go:566: Got cost interval [20, 18446744073709551615], wanted [5, 5]
FAIL
FAIL    github.com/google/cel-go/checker    0.003s

The cost assertions failing aside, the fact that the second evaluates to an unbounded cost is wrong and reproduces something similar to the original report without having to chain anything.

TristonianJones commented 9 months ago

@jpbetz @stevekuznetsov, my apologies, I misread the report about the expectations.

My guess is that all comprehensions are looking for fixed size iteration ranges, so any chaining / nesting with more than one comprehension will likely trigger the issue. Happy to have more test cases though.

-Tristan

stevekuznetsov commented 9 months ago

@TristonianJones I'm new to the project so I don't know if I understand your comment about "all comprehensions are looking for fixed size iteration ranges" but I believe my second test case has fixed-size inputs for all and still evaluates to an unbounded result:

 `[1,2,3].all(i, i in [1,2,3].map(j, j + j))`

For the original report from @jpbetz with a chained [].map().filter() are you saying that the current approach is not able to determine that .filter() is getting a bounded input; and, if so, is that an issue or something working as expected?

Edit: I guess in both cases it seems like the evaluator is not able to determine that the output of .map() is a fixed size.

Edit 2: we hit this case on the output of the .map() call, which brings our cost to unbounded.