swift-server / swift-prometheus

Prometheus client library for Swift
https://swiftpackageindex.com/swift-server/swift-prometheus
Apache License 2.0
145 stars 32 forks source link

Crash while collect(): Unexpectedly found nil while unwrapping an Optional value at CircularBuffer.swift #79

Closed ktoso closed 1 year ago

ktoso commented 1 year ago

The threading / locking seems to be wrong in collects.

We should revisit how locking is done, but also perhaps look into a larger redesign; the way we're handling locking is a bit all over the place.

eceived signal 4. Backtrace:
0x55e60b3f9392, Backtrace.(printBacktrace in _B82A8C0ED7C904841114FDF244F9E58E)(signal: Swift.Int32) -> () at /build/.build/checkouts/swift-backtrace/Sources/Backtrace/Backtrace.swift:66
0x55e60b3f9627, closure #1 (Swift.Int32) -> () in static Backtrace.Backtrace.install(signals: Swift.Array<Swift.Int32>) -> () at /build/.build/checkouts/swift-backtrace/Sources/Backtrace/Backtrace.swift:80
0x55e60b3f9627, @objc closure #1 (Swift.Int32) -> () in static Backtrace.Backtrace.install(signals: Swift.Array<Swift.Int32>) -> () at /build/<compiler-generated>:79
0x7fbb6950351f
0x55e60b9ccb64, Swift runtime failure: Unexpectedly found nil while unwrapping an Optional value at /build/.build/checkouts/swift-nio/Sources/NIOCore/CircularBuffer.swift:0
0x55e60b9ccb64, NIOCore.CircularBuffer.subscript.getter : (NIOCore.CircularBuffer<A>.Index) -> A at /build/.build/checkouts/swift-nio/Sources/NIOCore/CircularBuffer.swift:158
0x55e60b9cd487, NIOCore.CircularBuffer.subscript.read : (NIOCore.CircularBuffer<A>.Index) -> A at /build/<compiler-generated>:0
0x55e60b9cd3fc, protocol witness for Swift.Collection.subscript.read : (A.Index) -> A.Element in conformance NIOCore.CircularBuffer<A> : Swift.Collection in NIOCore at /build/<compiler-generated>:0
0x7fbb69f7e351
0x55e60bbf3d31, closure #5 (Prometheus.DimensionLabels, Prometheus.PromSummary<A>) -> () in Prometheus.PromSummary.collect() -> Swift.String at /build/.build/checkouts/SwiftPrometheus/Sources/Prometheus/MetricTypes/Summary.swift:96
0x55e60bbf5295, reabstraction thunk helper <A where A: Prometheus.DoubleRepresentable> from @callee_guaranteed (@guaranteed Prometheus.DimensionLabels, @guaranteed Prometheus.PromSummary<A>) -> (@error @owned Swift.Error) to @escaping @callee_guaranteed (@in_guaranteed (key: Prometheus.DimensionLabels, value: Prometheus.PromSummary<A>)) -> (@error @owned Swift.Error) at /build/<compiler-generated>:0
0x55e60bbf5295, partial apply forwarder for reabstraction thunk helper <A where A: Prometheus.DoubleRepresentable> from @callee_guaranteed (@guaranteed Prometheus.DimensionLabels, @guaranteed Prometheus.PromSummary<A>) -> (@error @owned Swift.Error) to @escaping @callee_guaranteed (@in_guaranteed (key: Prometheus.DimensionLabels, value: Prometheus.PromSummary<A>)) -> (@error @owned Swift.Error) at /build/<compiler-generated>:0
0x7fbb6a007533
0x55e60bbf365d, Prometheus.PromSummary.collect() -> Swift.String at /build/.build/checkouts/SwiftPrometheus/Sources/Prometheus/MetricTypes/Summary.swift:94
0x55e60bbf4e7a, protocol witness for Prometheus.PromMetric.collect() -> Swift.String in conformance Prometheus.PromSummary<A> : Prometheus.PromMetric in Prometheus at /build/<compiler-generated>:0
0x55e60bbf8ed2, closure #1 (Prometheus.PromMetric) -> Swift.String in closure #2 @Sendable () async -> Swift.String in Prometheus.PrometheusClient.collect() async -> Swift.String at /build/.build/checkouts/SwiftPrometheus/Sources/Prometheus/Prometheus.swift:31
0x55e60bbf8ed2, generic specialization <Swift.Dictionary<Swift.String, Prometheus.PromMetric>.Values, Swift.String> of (extension in Swift):Swift.Collection.map<A>((A.Element) throws -> A1) throws -> Swift.Array<A1> at /build/<compiler-generated>:0
0x55e60bbf8ed2, (1) suspend resume partial function for closure #2 @Sendable () async -> Swift.String in Prometheus.PrometheusClient.collect() async -> Swift.String at /build/.build/checkouts/SwiftPrometheus/Sources/Prometheus/Prometheus.swift:31
0x7fbb6a4d849d
0x7fbb6a4d8c4b
0x7fbb6add33f4
0x7fbb6add31a2
0x7fbb6addea91
0x7fbb69555b42
0x7fbb695e79ff
0xffffffffffffffff
Illegal instruction (core dumped)
MrLotU commented 1 year ago

I've done some digging, and I found a way to fix this. It seems that using subSum.lock instead of lock (inferring self.lock) here

I've done some further digging and what I suspect is going on is that when calling summary.observe() with labels, a subSummary is created, on which observe() is also called. When this happens, it uses it's own lock to lock and then does whatever it has to. However when calling collect(), we grab these same subSummaries, but use the main summaries lock to lock. I'm guessing that because the summaries are all distinct class instances, locking from one end does not guarantee locking from the other end, and so the race condition appears.

I'm totally in favour of revisiting locking and redesigning it, since it's an area of expertise I'm (still) not very comfortable with. The odds of mistakes, bugs or suboptimal implementations are quite large.

MrLotU commented 1 year ago

@ktoso Could you share what Swift/OS version you ran this on? Based on the automated tests for #81 , the issue seems to only occur on linux-nightly and macos builds.

mrstegeman commented 1 year ago

I’ve been seeing this with Swift 5.7.1 on Linux, but it’s been happening for a while, I believe even with Swift 5.6.X.

blindspotbounty commented 1 year ago

@MrLotU you are right. Not sure who is an owner of the repo but the fix is waiting for being merged: https://github.com/swift-server-community/SwiftPrometheus/pull/82

ktoso commented 1 year ago

Thanks for the ping, I missed that -- I can kick off CI there and merge 👍