yorkie-team / yorkie

Yorkie is a document store for collaborative applications.
https://yorkie.dev
Apache License 2.0
783 stars 145 forks source link

Add a metric to collect the number of background routines #963

Closed kokodak closed 2 months ago

kokodak commented 2 months ago

What this PR does / why we need it:

This PR adds a metric to collect the number of goroutines attached to a specific Background task.

Prometheus provides its own thread-safe Gauge metric, which can be used to accurately collect the number of concurrent goroutines.

Also, although the only task running in the background so far is PushPull, I implemented the AttachGoroutine() method to take a string called taskType as a parameter, since it seems to be focused on reusability.

In my local Grafana, I observed the metrics with a simple promQL query like the one below.

image

The expected effect is that it will be easy to identify which background tasks are bottlenecking and when. For example, during the logic of saving snapshots, the number of observed goroutines may temporarily increase when the DB bottleneck causes each goroutine to have a longer lifecycle. This kind of analysis would help us prevent such issues.

Which issue(s) this PR fixes:

Fixes #386

Special notes for your reviewer:

If you think there's a better metric or promQL to get more meaningful data, please feel free to comment.

Does this PR introduce a user-facing change?:

Additional documentation:

Checklist:

Summary by CodeRabbit

Summary by CodeRabbit

These changes significantly enhance the ability to manage and monitor background tasks within the application.

coderabbitai[bot] commented 2 months ago

Walkthrough

The recent changes enhance the tracking of background goroutines by modifying key methods to accept metrics and task type parameters. This allows for improved monitoring of background tasks via Prometheus, facilitating better management and observability of worker counts. Such updates ensure that metrics accurately reflect the state of goroutines, thus supporting proactive performance management.

Changes

Files Change Summary
server/backend/background/background.go, server/packs/packs.go Enhanced methods to include metrics tracking and identifiers, allowing for improved observability of background tasks.
server/backend/backend.go Updated background instance creation to include metrics, improving task management capabilities.
test/integration/retention_test.go, test/integration/snapshot_test.go Updated tests to reflect new method signatures for better metrics integration.

Assessment against linked issues

Objective Addressed Explanation
Collect Worker Count Metric that Runs Through the Background Package (#386)

Poem

🐇 In the garden where shadows play,
Metrics dance and count the day.
With each task that hops along,
Background workers sing their song.
Watch them flourish, one and all,
Thanks to changes, stand up tall! 🌼✨


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share - [X](https://twitter.com/intent/tweet?text=I%20just%20used%20%40coderabbitai%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20the%20proprietary%20code.%20Check%20it%20out%3A&url=https%3A//coderabbit.ai) - [Mastodon](https://mastodon.social/share?text=I%20just%20used%20%40coderabbitai%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20the%20proprietary%20code.%20Check%20it%20out%3A%20https%3A%2F%2Fcoderabbit.ai) - [Reddit](https://www.reddit.com/submit?title=Great%20tool%20for%20code%20review%20-%20CodeRabbit&text=I%20just%20used%20CodeRabbit%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20proprietary%20code.%20Check%20it%20out%3A%20https%3A//coderabbit.ai) - [LinkedIn](https://www.linkedin.com/sharing/share-offsite/?url=https%3A%2F%2Fcoderabbit.ai&mini=true&title=Great%20tool%20for%20code%20review%20-%20CodeRabbit&summary=I%20just%20used%20CodeRabbit%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20proprietary%20code)
Tips ### Chat There are 3 ways to chat with [CodeRabbit](https://coderabbit.ai): - Review comments: Directly reply to a review comment made by CodeRabbit. Example: - `I pushed a fix in commit .` - `Generate unit testing code for this file.` - `Open a follow-up GitHub issue for this discussion.` - Files and specific lines of code (under the "Files changed" tab): Tag `@coderabbitai` in a new review comment at the desired location with your query. Examples: - `@coderabbitai generate unit testing code for this file.` - `@coderabbitai modularize this function.` - PR comments: Tag `@coderabbitai` in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples: - `@coderabbitai generate interesting stats about this repository and render them as a table.` - `@coderabbitai show all the console.log statements in this repository.` - `@coderabbitai read src/utils.ts and generate unit testing code.` - `@coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.` - `@coderabbitai help me debug CodeRabbit configuration file.` Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. ### CodeRabbit Commands (invoked as PR comments) - `@coderabbitai pause` to pause the reviews on a PR. - `@coderabbitai resume` to resume the paused reviews. - `@coderabbitai review` to trigger an incremental review. This is useful when automatic reviews are disabled for the repository. - `@coderabbitai full review` to do a full review from scratch and review all the files again. - `@coderabbitai summary` to regenerate the summary of the PR. - `@coderabbitai resolve` resolve all the CodeRabbit review comments. - `@coderabbitai configuration` to show the current CodeRabbit configuration for the repository. - `@coderabbitai help` to get help. Additionally, you can add `@coderabbitai ignore` anywhere in the PR description to prevent this PR from being reviewed. ### CodeRabbit Configuration File (`.coderabbit.yaml`) - You can programmatically configure CodeRabbit by adding a `.coderabbit.yaml` file to the root of your repository. - Please see the [configuration documentation](https://docs.coderabbit.ai/guides/configure-coderabbit) for more information. - If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: `# yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json` ### Documentation and Community - Visit our [Documentation](https://coderabbit.ai/docs) for detailed information on how to use CodeRabbit. - Join our [Discord Community](https://discord.com/invite/GsXnASn26c) to get help, request features, and share feedback. - Follow us on [X/Twitter](https://twitter.com/coderabbitai) for updates and announcements.
krapie commented 2 months ago

@kokodak We have deployed this feature with Grafana panel added. But it seems like the metrics is not collected. Can you verify it again?

image
kokodak commented 2 months ago

@krapie There is no problem with the metric collection itself. However, in most scenarios, the lifetime of the goroutines running in the background is extremely short, and the interval between collecting metrics is long, so it doesn't seem to be observed.

This was happening to me locally, but when I forcibly increased the lifetime of the goroutine, I was able to observe it normally. On the yorkie main server, there is some traffic, so I thought it would be observed, but it seems not. I think the main reason is that the goroutine has a very short lifetime.

How about changing the metric from Gauge to Counter, and changing it to observe the gradient of the number of goroutines increasing in time?

krapie commented 2 months ago

@kokodak Good point! Looks like the lifespan of the goroutine is too short to collect. Well, I can see some changes when I change the query to rate_interval.

image

cc. @hackerwins