open-policy-agent / opa

Open Policy Agent (OPA) is an open source, general-purpose policy engine.
https://www.openpolicyagent.org
Apache License 2.0
9.74k stars 1.35k forks source link

Panics when opa running as a server #7117

Open alam-chime opened 1 month ago

alam-chime commented 1 month ago

Short description

We're running opa server as a sidecar in kubernetes. At the time of the issue, both memory and cpu usage were well below the defined limits. For the majority of requests, we are receiving the expected outcomes. But there are a few instances where we're seeing HTTP 502s and 504s from opa. There are no differences between the inputs of the failing requests and those that succeed.

Steps To Reproduce

We haven't been able to reproduce this issue locally, but we'll provide an update if we're successful.

Expected behavior

there shouldn't be a panic and opa server should respond back with the decision.

Additional context

This issue happens randomly, with no difference in the input between the requests that panic and the ones that succeed. The policy and data files are too big to share here, but I can create a smaller example if needed. Maybe the error logs are helpful for now?

anderseknert commented 1 month ago

Hey, and thanks for filing this issue! Those error logs are helpful indeed. A few of us looked into this briefly today, and tbh, this is quite a mystery 🕵️ There's nothing obvious on those lines that should cause a nil dereference under any normal circumstances... and we've not yet managed to come up with even exceptional circumstance that provably cause one. If there's some obvious case we've overlooked, I'd be happy to hear about it!

alam-chime commented 1 month ago

thanks @anderseknert for taking a look. Do you have any suggestions for debugging this issue? I'm also adding more error logs related to the panics we're seeing. I'll get back with a small example for our setup soon.

2024/10/11 09:12:51 http: panic serving 127.0.0.1:37098: runtime error: makeslice: len out of range
goroutine 950702 [running]:
net/http.(*conn).serve.func1()
/usr/local/go/src/net/http/server.go:1898 +0xbe
panic({0x114a0e0?, 0x17f50f0?})
/usr/local/go/src/runtime/panic.go:770 +0x132
encoding/json.(*encodeState).marshal.func1()
/usr/local/go/src/encoding/json/encode.go:293 +0x6d
panic({0x114a0e0?, 0x17f50f0?})
/usr/local/go/src/runtime/panic.go:770 +0x132
encoding/json.mapEncoder.encode({0x100000002?}, 0xc000dc8240, {0x1135cc0?, 0x666390?, 0x1135cc0?}, {0x5f?, 0x85?})

2024/10/12 16:27:42 http: panic serving 127.0.0.1:41396: can't call pointer on a non-pointer Value
goroutine 623239 [running]:
net/http.(*conn).serve.func1()
/usr/local/go/src/net/http/server.go:1898 +0xbe
panic({0x10cf800?, 0x17f6460?})
/usr/local/go/src/runtime/panic.go:770 +0x132
encoding/json.(*encodeState).marshal.func1()
/usr/local/go/src/encoding/json/encode.go:293 +0x6d
panic({0x10cf800?, 0x17f6460?})
/usr/local/go/src/runtime/panic.go:770 +0x132
reflect.Value.pointer(...)
/usr/local/go/src/reflect/value.go:110
reflect.Value.lenNonSlice({0x1135cc0?, 0xc000b90cf0?, 0x41a205?})

2024/10/13 22:14:10 http: panic serving 127.0.0.1:52366: reflect: call of reflect.Value.IsNil on zero Value
goroutine 1459006 [running]:
net/http.(*conn).serve.func1()
/usr/local/go/src/net/http/server.go:1898 +0xbe
panic({0x112a800?, 0xc000879c68?})
/usr/local/go/src/runtime/panic.go:770 +0x132
encoding/json.(*encodeState).marshal.func1()
/usr/local/go/src/encoding/json/encode.go:293 +0x6d
panic({0x112a800?, 0xc000879c68?})
/usr/local/go/src/runtime/panic.go:770 +0x132
reflect.Value.IsNil(...)
/usr/local/go/src/reflect/value.go:1574
encoding/json.interfaceEncoder(0xc0005c1080, {0x1115f80?, 0xc000682f40?, 0x43?}, {0xe0?, 0x5f?})
/usr/local/go/src/encoding/json/encode.go:654 +0x110

2024/10/13 14:40:25 http: panic serving 127.0.0.1:35712: unexpected map key type
goroutine 1730644 [running]:
net/http.(*conn).serve.func1()
/usr/local/go/src/net/http/server.go:1898 +0xbe
panic({0x10cf800?, 0x17f5430?})
/usr/local/go/src/runtime/panic.go:770 +0x132
encoding/json.(*encodeState).marshal.func1()
/usr/local/go/src/encoding/json/encode.go:293 +0x6d
panic({0x10cf800?, 0x17f5430?})
/usr/local/go/src/runtime/panic.go:770 +0x132
encoding/json.resolveKeyName({0x10cf800?, 0xc000ac6210?, 0xc000ac0660?})

2024/10/14 20:20:17 http: panic serving 127.0.0.1:41322: runtime error: hash of unhashable type ast.String
goroutine 586544 [running]:
net/http.(*conn).serve.func1()
/usr/local/go/src/net/http/server.go:1898 +0xbe
panic({0x114a0e0?, 0xc0011b1070?})
/usr/local/go/src/runtime/panic.go:770 +0x132
github.com/open-policy-agent/opa/topdown.(*baseCache).Put(...)
/src/topdown/cache.go:126
github.com/open-policy-agent/opa/topdown.(*eval).resolveReadFromStorage(0xc0011cc400, {0xc000578828, 0x3, 0x3}, {0x0, 0x0})
/src/topdown/eval.go:1641 +0x58f
github.com/open-policy-agent/opa/topdown.(*evalResolver).Resolve(0xc0011d3ec0, {0xc000578828, 0x3, 0x3})
/src/topdown/eval.go:1563 +0x657
github.com/open-policy-agent/opa/topdown.(*eval).Resolve(...)
/src/topdown/eval.go:1480

2024/10/14 19:43:45 http: panic serving 127.0.0.1:42696: reflect: call of reflect.Value.Bool on zero Value
goroutine 1640899 [running]:
net/http.(*conn).serve.func1()
/usr/local/go/src/net/http/server.go:1898 +0xbe
panic({0x112a800?, 0xc0010d2ae0?})
/usr/local/go/src/runtime/panic.go:770 +0x132
encoding/json.(*encodeState).marshal.func1()
/usr/local/go/src/encoding/json/encode.go:293 +0x6d
panic({0x112a800?, 0xc0010d2ae0?})
/usr/local/go/src/runtime/panic.go:770 +0x132
reflect.flag.mustBe(...)
/usr/local/go/src/reflect/value.go:233
reflect.Value.panicNotBool({0x10cfc00?, 0x2025768?, 0x532965?})
/usr/local/go/src/reflect/value.go:302 +0x7e
reflect.Value.Bool(...)
/usr/local/go/src/reflect/value.go:296

2024/10/14 18:57:46 http: panic serving 127.0.0.1:47100: runtime error: invalid memory address or nil pointer dereference
goroutine 227179 [running]:
net/http.(*conn).serve.func1()
/usr/local/go/src/net/http/server.go:1898 +0xbe
panic({0x114a0e0?, 0x2102610?})
/usr/local/go/src/runtime/panic.go:770 +0x132
github.com/open-policy-agent/opa/internal/deepcopy.Map(...)
/src/internal/deepcopy/deepcopy.go:28
github.com/open-policy-agent/opa/internal/deepcopy.DeepCopy({0x1135cc0?, 0xc000742870})
/src/internal/deepcopy/deepcopy.go:19 +0x173
github.com/open-policy-agent/opa/internal/deepcopy.Map(...)
/src/internal/deepcopy/deepcopy.go:28
github.com/open-policy-agent/opa/internal/deepcopy.DeepCopy({0x1135cc0?, 0xc001165980})

2024/10/12 13:12:22 http: panic serving 127.0.0.1:37580: illegal value: ast.String
goroutine 485802 [running]:
net/http.(*conn).serve.func1()
/usr/local/go/src/net/http/server.go:1898 +0xbe
panic({0x10cf800?, 0xc0003a8310?})
/usr/local/go/src/runtime/panic.go:770 +0x132
github.com/open-policy-agent/opa/ast.sortOrder({0x11d3b40?, 0xc00039fae0?})
/src/ast/compare.go:302 +0x3a5
github.com/open-policy-agent/opa/ast.Compare({0x11d3b40?, 0xc00039fbb0?}, {0x11d3b40?, 0xc00039fae0?})
/src/ast/compare.go:68 +0x10e
github.com/open-policy-agent/opa/ast.objectElemSlice.Less(...)
/src/ast/term.go:2087
philipaconrad commented 1 month ago

I'll get back with a small example for our setup soon.

@alam-chime Did you happen to put together the example? If you're able to share any additional steps or information about your OPA config, it could be helpful for debugging. :slightly_smiling_face:

alam-chime commented 1 week ago

sorry @philipaconrad I missed this message. Will share an example this week.