This PR moves Labels equality check outside of Lock.
The rationale behind it is following:
App's metrics list is usually bound and don't grow a lot after some time of the app being operational. The code already has this constraint implicitly by the fact that none of Prometheus.metrics, Histogram.subHistograms, Summary.subSummaries, Counter.metrics are bounded, and will eventually cause an OOM if grow indefinitely. From the domain standpoint, unique label names and values are discouraged and don't provide a lot of insight to app's behaviour.
With the above constraint we know that after some time of app being operational, all or most of label key-value pairs are already stored in corresponding Histogram.subHistograms / Summary.subSummaries. This makes getting a subSummary/subHistogram from the map a target for optimisations: getOrCreate flow becomes highly skewed towards get part. Suggested change improves performance of get part of getOrCreate and increases the cost less frequently used create part. On my local machine this results in more than 2x throughput boost for the following test setups:
func testConcurrent() throws {
let prom = PrometheusClient()
let histogram = prom.createHistogram(forType: Double.self,
named: "my_histogram",
helpText: "Histogram for testing",
buckets: Buckets.exponential(start: 1, factor: 2, count: 63),
labels: DimensionHistogramLabels.self)
let elg = MultiThreadedEventLoopGroup(numberOfThreads: 8)
let semaphore = DispatchSemaphore(value: 2)
let time = DispatchTime.now()
elg.next().submit {
while DispatchTime.now().uptimeNanoseconds - time.uptimeNanoseconds < 10_000_000_000 {
let labels = DimensionHistogramLabels([("myValue", "1")])
histogram.observe(1.0, labels)
}
semaphore.signal()
}
elg.next().submit {
while DispatchTime.now().uptimeNanoseconds - time.uptimeNanoseconds < 10_000_000_000 {
let labels = DimensionHistogramLabels([("myValue", "2")])
histogram.observe(1.0, labels)
}
semaphore.signal()
}
semaphore.wait()
try elg.syncShutdownGracefully()
print(histogram.collect())
print(DispatchTime.now().uptimeNanoseconds - time.uptimeNanoseconds)
}
Checklist
[ + ] The provided tests still run.
[ + ] I've created new tests where needed.
[ N/A ] I've updated the documentation if necessary.
Motivation and Context
This PR optimises metric emission, which results in higher throughput and lower impact on running system.
Description
scope of locking is now reduced to only obtain/modify current value of PromHistogram.subHistograms / PromSummary.subSummaries
If subHistogram/subSummaries for given labels is not yet added to the corresponding PromHistogram / PromSummary, the lock will be held twice: to get current value and to re-check + store new value.
Created tests to access/update Summary/Histogram from multiple threads
Ran TSAN against the change to check for concurrency issues.
This PR moves
Labels
equality check outside ofLock
. The rationale behind it is following: App's metrics list is usually bound and don't grow a lot after some time of the app being operational. The code already has this constraint implicitly by the fact that none ofPrometheus.metrics
,Histogram.subHistograms
,Summary.subSummaries
,Counter.metrics
are bounded, and will eventually cause an OOM if grow indefinitely. From the domain standpoint, unique label names and values are discouraged and don't provide a lot of insight to app's behaviour. With the above constraint we know that after some time of app being operational, all or most of label key-value pairs are already stored in correspondingHistogram.subHistograms
/Summary.subSummaries
. This makes getting a subSummary/subHistogram from the map a target for optimisations: getOrCreate flow becomes highly skewed towardsget
part. Suggested change improves performance ofget
part ofgetOrCreate
and increases the cost less frequently usedcreate
part. On my local machine this results in more than 2x throughput boost for the following test setups:Checklist
Motivation and Context
This PR optimises metric emission, which results in higher throughput and lower impact on running system.
Description
scope of locking is now reduced to only obtain/modify current value of
PromHistogram.subHistograms
/PromSummary.subSummaries
If subHistogram/subSummaries for given labels is not yet added to the correspondingPromHistogram
/PromSummary
, the lock will be held twice: to get current value and to re-check + store new value.