open-policy-agent / opa

Open Policy Agent (OPA) is an open source, general-purpose policy engine.
https://www.openpolicyagent.org
Apache License 2.0
9.68k stars 1.34k forks source link

Partial Evaluation fails when using a Bundle API #2197

Closed AvivRubys closed 4 years ago

AvivRubys commented 4 years ago

Expected Behavior

When running OPA in server mode with a bundle api, and passing in the partial flag in POST /v1/data endpoint, expected OPA to use partial evaluation on the policy/data.

Actual Behavior

All requests receive an error response:

{
  "code": "internal_error",
  "message": "2 errors occurred:\nbundle/simple.rego:3: rego_compile_error: conflict check for data path simple/allow: storage_invalid_txn_error: stale transaction\nrego_compile_error: conflict check for data path partial/__result__: storage_invalid_txn_error: stale transaction"
}

Steps to Reproduce the Problem

  1. Serve policy from bundle api
  2. Attempt to access the document with the ?partial param

Alternatively, clone this repo, run docker-compose up -d opa bundle-api, and run normal/partial evaluation using docker-compose run normal/partial

Additional Info

It works when loading the bundle through --bundle flag.

patrick-east commented 4 years ago

Thanks for filing the bug, can you confirm what version of OPA you are seeing this with?

Edit: I see the compose file uses the latest tag https://github.com/AvivRubys/opa-bundle-api-partial-eval-fail-reproduction/blob/master/docker-compose.yaml#L12 so I guess that answers my question 😄

AvivRubys commented 4 years ago

Thanks for your help @tsandall, unfortunately it looks like it's still broken. I've updated the example repo's opa version to 0.19.1 which has your commit included. (Also tried with 0.20.0-dev-debug docker tag) When running docker-compose run partial, http 500 is returned with this body:

{
  "code": "internal_error",
  "message": "1 error occurred: rego_recursion_error: rule __result__ is recursive: __result__ -\u003e __result__"
}

If, after one failed partial curl I try a non-partial curl, no response is returned from OPA, and it logs this:

2020/04/16 14:36:11 http: panic serving 172.27.0.4:58890: runtime error: invalid memory address or nil pointer dereference
goroutine 10 [running]:
net/http.(*conn).serve.func1(0xc0002660a0)
    /usr/local/go/src/net/http/server.go:1767 +0x139
panic(0xedb660, 0x158e150)
    /usr/local/go/src/runtime/panic.go:679 +0x1b2
github.com/open-policy-agent/opa/topdown.evalVirtual.eval(0xc00023ec60, 0xc00000f440, 0x4, 0x4, 0xc00000f4e0, 0x4, 0x4, 0x3, 0xc00022db60, 0xc00000f480, ...)
    /src/topdown/eval.go:1606 +0x8f
github.com/open-policy-agent/opa/topdown.evalTree.next(0xc00023ec60, 0xc00000f440, 0x4, 0x4, 0xc00000f4e0, 0x4, 0x4, 0x3, 0xc00022db60, 0xc00000f480, ...)
    /src/topdown/eval.go:1451 +0x2ee
github.com/open-policy-agent/opa/topdown.evalTree.eval(0xc00023ec60, 0xc00000f440, 0x4, 0x4, 0xc00000f4e0, 0x4, 0x4, 0x3, 0xc00022db60, 0xc00000f480, ...)
    /src/topdown/eval.go:1404 +0xde
github.com/open-policy-agent/opa/topdown.evalTree.next(0xc00023ec60, 0xc00000f440, 0x4, 0x4, 0xc00000f4e0, 0x4, 0x4, 0x2, 0xc00022db60, 0xc00000f480, ...)
    /src/topdown/eval.go:1457 +0x128
github.com/open-policy-agent/opa/topdown.evalTree.eval(0xc00023ec60, 0xc00000f440, 0x4, 0x4, 0xc00000f4e0, 0x4, 0x4, 0x2, 0xc00022db60, 0xc00000f480, ...)
    /src/topdown/eval.go:1404 +0xde
github.com/open-policy-agent/opa/topdown.evalTree.next(0xc00023ec60, 0xc00000f440, 0x4, 0x4, 0xc00000f4e0, 0x4, 0x4, 0x1, 0xc00022db60, 0xc00000f480, ...)
    /src/topdown/eval.go:1457 +0x128
github.com/open-policy-agent/opa/topdown.evalTree.eval(0xc00023ec60, 0xc00000f440, 0x4, 0x4, 0xc00000f4e0, 0x4, 0x4, 0x1, 0xc00022db60, 0xc00000f480, ...)
    /src/topdown/eval.go:1404 +0xde
github.com/open-policy-agent/opa/topdown.(*eval).biunifyRef(0xc00023ec60, 0xc00000f4c0, 0xc00000f480, 0xc00022db60, 0xc00022db60, 0xc00022e580, 0x203000, 0x0)
    /src/topdown/eval.go:810 +0x3a3
github.com/open-policy-agent/opa/topdown.(*eval).biunifyValues(0xc00023ec60, 0xc00000f4c0, 0xc00000f480, 0xc00022db60, 0xc00022db60, 0xc00022e580, 0x7a5a8c, 0x1590340)
    /src/topdown/eval.go:727 +0x89f
github.com/open-policy-agent/opa/topdown.(*eval).biunify(0xc00023ec60, 0xc00000f4c0, 0xc00000f480, 0xc00022db60, 0xc00022db60, 0xc00022e580, 0xc00002a501, 0xc00022e580)
    /src/topdown/eval.go:624 +0x11f
github.com/open-policy-agent/opa/topdown.(*eval).unify(0xc00023ec60, 0xc00000f4c0, 0xc00000f480, 0xc00022e580, 0x2, 0xc00002a5aa)
    /src/topdown/eval.go:616 +0x57
github.com/open-policy-agent/opa/topdown.evalTree.leaves(0xc00023ec60, 0xc0000100a0, 0x1, 0x1, 0xc000010148, 0x1, 0x1, 0x1, 0xc00022db60, 0xc00000eba0, ...)
    /src/topdown/eval.go:1564 +0x4ba
github.com/open-policy-agent/opa/topdown.evalTree.leaves(0xc00023ec60, 0xc0000100a0, 0x1, 0x1, 0xc000010148, 0x1, 0x1, 0x1, 0xc00022db60, 0xc00000eba0, ...)
    /src/topdown/eval.go:1569 +0x748
github.com/open-policy-agent/opa/topdown.evalTree.leaves(0xc00023ec60, 0xc0000100a0, 0x1, 0x1, 0xc000010148, 0x1, 0x1, 0x1, 0xc00022db60, 0xc00000eba0, ...)
    /src/topdown/eval.go:1569 +0x748
github.com/open-policy-agent/opa/topdown.evalTree.extent(0xc00023ec60, 0xc0000100a0, 0x1, 0x1, 0xc000010148, 0x1, 0x1, 0x1, 0xc00022db60, 0xc00000eba0, ...)
    /src/topdown/eval.go:1521 +0xeb
github.com/open-policy-agent/opa/topdown.evalTree.finish(0xc00023ec60, 0xc0000100a0, 0x1, 0x1, 0xc000010148, 0x1, 0x1, 0x1, 0xc00022db60, 0xc00000eba0, ...)
    /src/topdown/eval.go:1419 +0x16b
github.com/open-policy-agent/opa/topdown.evalTree.eval(0xc00023ec60, 0xc0000100a0, 0x1, 0x1, 0xc000010148, 0x1, 0x1, 0x1, 0xc00022db60, 0xc00000eba0, ...)
    /src/topdown/eval.go:1398 +0x192
github.com/open-policy-agent/opa/topdown.(*eval).biunifyRef(0xc00023ec60, 0xc00000eb80, 0xc00000eba0, 0xc00022db60, 0xc00022db60, 0xc00022dc50, 0x0, 0x203000)
    /src/topdown/eval.go:810 +0x3a3
github.com/open-policy-agent/opa/topdown.(*eval).biunifyValues(0xc00023ec60, 0xc00000eb80, 0xc00000eba0, 0xc00022db60, 0xc00022db60, 0xc00022dc50, 0x15e3c80, 0x3936cab10c54)
    /src/topdown/eval.go:727 +0x89f
github.com/open-policy-agent/opa/topdown.(*eval).biunify(0xc00023ec60, 0xc00000eb80, 0xc00000eba0, 0xc00022db60, 0xc00022db60, 0xc00022dc50, 0xfc8201, 0xc00022dc50)
    /src/topdown/eval.go:624 +0x11f
github.com/open-policy-agent/opa/topdown.(*eval).unify(0xc00023ec60, 0xc00000eb80, 0xc00000eba0, 0xc00022dc50, 0xf31100, 0x10)
    /src/topdown/eval.go:616 +0x57
github.com/open-policy-agent/opa/topdown.(*eval).evalStep(0xc00023ec60, 0xc000054790, 0x4, 0x10e4540)
    /src/topdown/eval.go:266 +0x565
github.com/open-policy-agent/opa/topdown.(*eval).evalExpr(0xc00023ec60, 0xc000054780, 0x1, 0xc000054780)
    /src/topdown/eval.go:247 +0xed
github.com/open-policy-agent/opa/topdown.(*eval).eval(0xc00023ec60, 0xc000054780, 0x5, 0x10e5e00)
    /src/topdown/eval.go:223 +0x35
github.com/open-policy-agent/opa/topdown.(*eval).Run(0xc00023ec60, 0xc000054770, 0xf, 0x0)
    /src/topdown/eval.go:66 +0xca
github.com/open-policy-agent/opa/topdown.(*Query).Iter(0xc0002dcf38, 0x10e87c0, 0xc00022cd50, 0xc00022db30, 0xc000240360, 0xc00000f2a0)
    /src/topdown/query.go:295 +0x3cd
github.com/open-policy-agent/opa/rego.(*Rego).eval(0xc0003b8000, 0x10e87c0, 0xc00022cd50, 0xc0003be000, 0x0, 0x0, 0x0, 0x0, 0x0)
    /src/rego/rego.go:1609 +0x4b5
github.com/open-policy-agent/opa/rego.PreparedEvalQuery.Eval(0xc0003b8000, 0xc000054410, 0x10e87c0, 0xc00022cd50, 0xc0002dd368, 0x4, 0x4, 0x0, 0x0, 0x0, ...)
    /src/rego/rego.go:301 +0x1a8
github.com/open-policy-agent/opa/server.(*Server).v1DataPost(0xc0001803c0, 0x10e4840, 0xc00000e400, 0xc000232500)
    /src/server/server.go:1312 +0x9bb
net/http.HandlerFunc.ServeHTTP(0xc000242b20, 0x10e4840, 0xc00000e400, 0xc000232500)
    /usr/local/go/src/net/http/server.go:2007 +0x44
github.com/open-policy-agent/opa/internal/prometheus.(*Provider).InstrumentHandler.func1(0x7f4423fe8638, 0xc000250370, 0xc000232500)
    /src/internal/prometheus/prometheus.go:78 +0xf9
net/http.HandlerFunc.ServeHTTP(0xc00024cfc0, 0x7f4423fe8638, 0xc000250370, 0xc000232500)
    /usr/local/go/src/net/http/server.go:2007 +0x44
github.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerDuration.func1(0x10e49c0, 0xc000022440, 0xc000232500)
    /src/vendor/github.com/prometheus/client_golang/prometheus/promhttp/instrument_server.go:68 +0x11c
net/http.HandlerFunc.ServeHTTP(0xc0002651d0, 0x10e49c0, 0xc000022440, 0xc000232500)
    /usr/local/go/src/net/http/server.go:2007 +0x44
github.com/gorilla/mux.(*Router).ServeHTTP(0xc00021c3f0, 0x10e49c0, 0xc000022440, 0xc000232500)
    /src/vendor/github.com/gorilla/mux/mux.go:162 +0x104
github.com/open-policy-agent/opa/runtime.(*LoggingHandler).ServeHTTP(0xc0002ad200, 0x10e58c0, 0xc0003b2000, 0xc000232300)
    /src/runtime/logging.go:83 +0x153
net/http.serverHandler.ServeHTTP(0xc0002fe000, 0x10e58c0, 0xc0003b2000, 0xc000232300)
    /usr/local/go/src/net/http/server.go:2802 +0xa4
net/http.(*conn).serve(0xc0002660a0, 0x10e8700, 0xc000022300)
    /usr/local/go/src/net/http/server.go:1890 +0x875
created by net/http.(*Server).Serve
    /usr/local/go/src/net/http/server.go:2928 +0x384
tsandall commented 4 years ago

I looked into this earlier today. The problem is that the server is running PE using the shared compiler. In this case, the rego package is generating a recursive rule from the PE output--it should have thrown away the rule but it didn't. When the rego package goes to compile the result before returning, the recursion is caught but the compiler is left in an inconsistent state. The inconsistent state causes the panic later.

One option is simply to update the rego package to throw away the result if it's recursive. We can assume that PE output is always valid, so compilation of the result should never fail--in this case it's the rego package making a mistake when it constructs rules from the queries returned by PE. Another option is to have the server use a different compiler when running PE--this way any errors on the compile after PE will not affect the shared compiler.

I'm inclined to just fix the rego package to throw away the recursive result. The rego package can detect recursion in this case by simply walking the refs in the PE query outputs and check if they are prefixes of the rules being constructed. This feels like the easiest solution and since we plan to eventually deprecate the lazy PE API, this seems like the right solution (once lazy PE is gone, the server won't have to run PE at all for optimization purposes, it'll be handled ahead of time.)

Note, this issue only occurs if you're running PE on queries that ask for ALL of /v1/data. Normally this only happens inside development/debug environments.

On Thu, Apr 16, 2020 at 1:41 PM Patrick East notifications@github.com wrote:

Reopened #2197 https://github.com/open-policy-agent/opa/issues/2197.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/open-policy-agent/opa/issues/2197#event-3241047390, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAB2KJK7VEECUU4CK5D7HNDRM47N5ANCNFSM4LOVORCA .

-- -Torin