Closed kokodak closed 2 months ago
The recent changes enhance the tracking of background goroutines by modifying key methods to accept metrics and task type parameters. This allows for improved monitoring of background tasks via Prometheus, facilitating better management and observability of worker counts. Such updates ensure that metrics accurately reflect the state of goroutines, thus supporting proactive performance management.
Files | Change Summary |
---|---|
server/backend/background/background.go , server/packs/packs.go |
Enhanced methods to include metrics tracking and identifiers, allowing for improved observability of background tasks. |
server/backend/backend.go |
Updated background instance creation to include metrics, improving task management capabilities. |
test/integration/retention_test.go , test/integration/snapshot_test.go |
Updated tests to reflect new method signatures for better metrics integration. |
Objective | Addressed | Explanation |
---|---|---|
Collect Worker Count Metric that Runs Through the Background Package (#386) | ✅ |
🐇 In the garden where shadows play,
Metrics dance and count the day.
With each task that hops along,
Background workers sing their song.
Watch them flourish, one and all,
Thanks to changes, stand up tall! 🌼✨
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?
@kokodak We have deployed this feature with Grafana panel added. But it seems like the metrics is not collected. Can you verify it again?
@krapie There is no problem with the metric collection itself. However, in most scenarios, the lifetime of the goroutines running in the background is extremely short, and the interval between collecting metrics is long, so it doesn't seem to be observed.
This was happening to me locally, but when I forcibly increased the lifetime of the goroutine, I was able to observe it normally. On the yorkie main server, there is some traffic, so I thought it would be observed, but it seems not. I think the main reason is that the goroutine has a very short lifetime.
How about changing the metric from Gauge to Counter, and changing it to observe the gradient of the number of goroutines increasing in time?
@kokodak Good point! Looks like the lifespan of the goroutine is too short to collect. Well, I can see some changes when I change the query to rate_interval.
cc. @hackerwins
What this PR does / why we need it:
This PR adds a metric to collect the number of goroutines attached to a specific Background task.
Prometheus provides its own thread-safe Gauge metric, which can be used to accurately collect the number of concurrent goroutines.
Also, although the only task running in the background so far is PushPull, I implemented the
AttachGoroutine()
method to take a string calledtaskType
as a parameter, since it seems to be focused on reusability.In my local Grafana, I observed the metrics with a simple promQL query like the one below.
The expected effect is that it will be easy to identify which background tasks are bottlenecking and when. For example, during the logic of saving snapshots, the number of observed goroutines may temporarily increase when the DB bottleneck causes each goroutine to have a longer lifecycle. This kind of analysis would help us prevent such issues.
Which issue(s) this PR fixes:
Fixes #386
Special notes for your reviewer:
If you think there's a better metric or promQL to get more meaningful data, please feel free to comment.
Does this PR introduce a user-facing change?:
Additional documentation:
Checklist:
Summary by CodeRabbit
Summary by CodeRabbit
New Features
Improvements
AttachGoroutine
,PushPull
, and other related functions.These changes significantly enhance the ability to manage and monitor background tasks within the application.