Closed ktoso closed 1 year ago
I've done some digging, and I found a way to fix this. It seems that using subSum.lock
instead of lock
(inferring self.lock
) here
I've done some further digging and what I suspect is going on is that when calling summary.observe()
with labels, a subSummary is created, on which observe()
is also called. When this happens, it uses it's own lock to lock and then does whatever it has to. However when calling collect()
, we grab these same subSummaries, but use the main summaries lock to lock. I'm guessing that because the summaries are all distinct class instances, locking from one end does not guarantee locking from the other end, and so the race condition appears.
I'm totally in favour of revisiting locking and redesigning it, since it's an area of expertise I'm (still) not very comfortable with. The odds of mistakes, bugs or suboptimal implementations are quite large.
@ktoso Could you share what Swift/OS version you ran this on? Based on the automated tests for #81 , the issue seems to only occur on linux-nightly and macos builds.
I’ve been seeing this with Swift 5.7.1 on Linux, but it’s been happening for a while, I believe even with Swift 5.6.X.
@MrLotU you are right. Not sure who is an owner of the repo but the fix is waiting for being merged: https://github.com/swift-server-community/SwiftPrometheus/pull/82
Thanks for the ping, I missed that -- I can kick off CI there and merge 👍
The threading / locking seems to be wrong in collects.
We should revisit how locking is done, but also perhaps look into a larger redesign; the way we're handling locking is a bit all over the place.