Open alam-chime opened 1 month ago
Hey, and thanks for filing this issue! Those error logs are helpful indeed. A few of us looked into this briefly today, and tbh, this is quite a mystery 🕵️ There's nothing obvious on those lines that should cause a nil dereference under any normal circumstances... and we've not yet managed to come up with even exceptional circumstance that provably cause one. If there's some obvious case we've overlooked, I'd be happy to hear about it!
thanks @anderseknert for taking a look. Do you have any suggestions for debugging this issue? I'm also adding more error logs related to the panics we're seeing. I'll get back with a small example for our setup soon.
2024/10/11 09:12:51 http: panic serving 127.0.0.1:37098: runtime error: makeslice: len out of range
goroutine 950702 [running]:
net/http.(*conn).serve.func1()
/usr/local/go/src/net/http/server.go:1898 +0xbe
panic({0x114a0e0?, 0x17f50f0?})
/usr/local/go/src/runtime/panic.go:770 +0x132
encoding/json.(*encodeState).marshal.func1()
/usr/local/go/src/encoding/json/encode.go:293 +0x6d
panic({0x114a0e0?, 0x17f50f0?})
/usr/local/go/src/runtime/panic.go:770 +0x132
encoding/json.mapEncoder.encode({0x100000002?}, 0xc000dc8240, {0x1135cc0?, 0x666390?, 0x1135cc0?}, {0x5f?, 0x85?})
2024/10/12 16:27:42 http: panic serving 127.0.0.1:41396: can't call pointer on a non-pointer Value
goroutine 623239 [running]:
net/http.(*conn).serve.func1()
/usr/local/go/src/net/http/server.go:1898 +0xbe
panic({0x10cf800?, 0x17f6460?})
/usr/local/go/src/runtime/panic.go:770 +0x132
encoding/json.(*encodeState).marshal.func1()
/usr/local/go/src/encoding/json/encode.go:293 +0x6d
panic({0x10cf800?, 0x17f6460?})
/usr/local/go/src/runtime/panic.go:770 +0x132
reflect.Value.pointer(...)
/usr/local/go/src/reflect/value.go:110
reflect.Value.lenNonSlice({0x1135cc0?, 0xc000b90cf0?, 0x41a205?})
2024/10/13 22:14:10 http: panic serving 127.0.0.1:52366: reflect: call of reflect.Value.IsNil on zero Value
goroutine 1459006 [running]:
net/http.(*conn).serve.func1()
/usr/local/go/src/net/http/server.go:1898 +0xbe
panic({0x112a800?, 0xc000879c68?})
/usr/local/go/src/runtime/panic.go:770 +0x132
encoding/json.(*encodeState).marshal.func1()
/usr/local/go/src/encoding/json/encode.go:293 +0x6d
panic({0x112a800?, 0xc000879c68?})
/usr/local/go/src/runtime/panic.go:770 +0x132
reflect.Value.IsNil(...)
/usr/local/go/src/reflect/value.go:1574
encoding/json.interfaceEncoder(0xc0005c1080, {0x1115f80?, 0xc000682f40?, 0x43?}, {0xe0?, 0x5f?})
/usr/local/go/src/encoding/json/encode.go:654 +0x110
2024/10/13 14:40:25 http: panic serving 127.0.0.1:35712: unexpected map key type
goroutine 1730644 [running]:
net/http.(*conn).serve.func1()
/usr/local/go/src/net/http/server.go:1898 +0xbe
panic({0x10cf800?, 0x17f5430?})
/usr/local/go/src/runtime/panic.go:770 +0x132
encoding/json.(*encodeState).marshal.func1()
/usr/local/go/src/encoding/json/encode.go:293 +0x6d
panic({0x10cf800?, 0x17f5430?})
/usr/local/go/src/runtime/panic.go:770 +0x132
encoding/json.resolveKeyName({0x10cf800?, 0xc000ac6210?, 0xc000ac0660?})
2024/10/14 20:20:17 http: panic serving 127.0.0.1:41322: runtime error: hash of unhashable type ast.String
goroutine 586544 [running]:
net/http.(*conn).serve.func1()
/usr/local/go/src/net/http/server.go:1898 +0xbe
panic({0x114a0e0?, 0xc0011b1070?})
/usr/local/go/src/runtime/panic.go:770 +0x132
github.com/open-policy-agent/opa/topdown.(*baseCache).Put(...)
/src/topdown/cache.go:126
github.com/open-policy-agent/opa/topdown.(*eval).resolveReadFromStorage(0xc0011cc400, {0xc000578828, 0x3, 0x3}, {0x0, 0x0})
/src/topdown/eval.go:1641 +0x58f
github.com/open-policy-agent/opa/topdown.(*evalResolver).Resolve(0xc0011d3ec0, {0xc000578828, 0x3, 0x3})
/src/topdown/eval.go:1563 +0x657
github.com/open-policy-agent/opa/topdown.(*eval).Resolve(...)
/src/topdown/eval.go:1480
2024/10/14 19:43:45 http: panic serving 127.0.0.1:42696: reflect: call of reflect.Value.Bool on zero Value
goroutine 1640899 [running]:
net/http.(*conn).serve.func1()
/usr/local/go/src/net/http/server.go:1898 +0xbe
panic({0x112a800?, 0xc0010d2ae0?})
/usr/local/go/src/runtime/panic.go:770 +0x132
encoding/json.(*encodeState).marshal.func1()
/usr/local/go/src/encoding/json/encode.go:293 +0x6d
panic({0x112a800?, 0xc0010d2ae0?})
/usr/local/go/src/runtime/panic.go:770 +0x132
reflect.flag.mustBe(...)
/usr/local/go/src/reflect/value.go:233
reflect.Value.panicNotBool({0x10cfc00?, 0x2025768?, 0x532965?})
/usr/local/go/src/reflect/value.go:302 +0x7e
reflect.Value.Bool(...)
/usr/local/go/src/reflect/value.go:296
2024/10/14 18:57:46 http: panic serving 127.0.0.1:47100: runtime error: invalid memory address or nil pointer dereference
goroutine 227179 [running]:
net/http.(*conn).serve.func1()
/usr/local/go/src/net/http/server.go:1898 +0xbe
panic({0x114a0e0?, 0x2102610?})
/usr/local/go/src/runtime/panic.go:770 +0x132
github.com/open-policy-agent/opa/internal/deepcopy.Map(...)
/src/internal/deepcopy/deepcopy.go:28
github.com/open-policy-agent/opa/internal/deepcopy.DeepCopy({0x1135cc0?, 0xc000742870})
/src/internal/deepcopy/deepcopy.go:19 +0x173
github.com/open-policy-agent/opa/internal/deepcopy.Map(...)
/src/internal/deepcopy/deepcopy.go:28
github.com/open-policy-agent/opa/internal/deepcopy.DeepCopy({0x1135cc0?, 0xc001165980})
2024/10/12 13:12:22 http: panic serving 127.0.0.1:37580: illegal value: ast.String
goroutine 485802 [running]:
net/http.(*conn).serve.func1()
/usr/local/go/src/net/http/server.go:1898 +0xbe
panic({0x10cf800?, 0xc0003a8310?})
/usr/local/go/src/runtime/panic.go:770 +0x132
github.com/open-policy-agent/opa/ast.sortOrder({0x11d3b40?, 0xc00039fae0?})
/src/ast/compare.go:302 +0x3a5
github.com/open-policy-agent/opa/ast.Compare({0x11d3b40?, 0xc00039fbb0?}, {0x11d3b40?, 0xc00039fae0?})
/src/ast/compare.go:68 +0x10e
github.com/open-policy-agent/opa/ast.objectElemSlice.Less(...)
/src/ast/term.go:2087
I'll get back with a small example for our setup soon.
@alam-chime Did you happen to put together the example? If you're able to share any additional steps or information about your OPA config, it could be helpful for debugging. :slightly_smiling_face:
sorry @philipaconrad I missed this message. Will share an example this week.
Short description
We're running opa server as a sidecar in kubernetes. At the time of the issue, both memory and cpu usage were well below the defined limits. For the majority of requests, we are receiving the expected outcomes. But there are a few instances where we're seeing HTTP 502s and 504s from opa. There are no differences between the inputs of the failing requests and those that succeed.
0.66.0
, but we're seeing this behavior with0.69.0
alsoSteps To Reproduce
We haven't been able to reproduce this issue locally, but we'll provide an update if we're successful.
Expected behavior
there shouldn't be a panic and opa server should respond back with the decision.
Additional context
This issue happens randomly, with no difference in the input between the requests that panic and the ones that succeed. The policy and data files are too big to share here, but I can create a smaller example if needed. Maybe the error logs are helpful for now?