valkey-io / valkey

A flexible distributed key-value datastore that is optimized for caching and other realtime workloads.
https://valkey.io
Other
17.36k stars 656 forks source link

Feature COMMANDLOG to record slow execution and large request/reply #1294

Open soloestoy opened 1 week ago

soloestoy commented 1 week ago

As discussed in PR #336.

We have different types of resources like CPU, memory, network, etc. The slowlog can only record commands eat lots of CPU during the processing phase (doesn't include read/write network time), but can not record commands eat too many memory and network. For example:

  1. run "SET key value(10 megabytes)" command would not be recored in slowlog, since when processing it the SET command only insert the value's pointer into db dict. But that command eats huge memory in query buffer and bandwidth from network. In this case, just 1000 tps can cause 10GB/s network flow.
  2. run "GET key" command and the key's value length is 10 megabytes. The get command can eat huge memory in output buffer and bandwidth to network.

This PR introduces a new command COMMANDLOG, to log commands that consume significant network bandwidth, including both input and output. Users can retrieve the results using COMMANDLOG get <count> large-request and COMMANDLOG get <count> large-reply, all subcommands for COMMANDLOG are:

And the slowlog is also incorporated into the commandlog.

codecov[bot] commented 1 week ago

Codecov Report

Attention: Patch coverage is 88.18898% with 15 lines in your changes missing coverage. Please review.

Project coverage is 70.71%. Comparing base (2df56d8) to head (03a6736). Report is 18 commits behind head on unstable.

Files with missing lines Patch % Lines
src/commandlog.c 92.79% 8 Missing :warning:
src/server.c 57.14% 3 Missing :warning:
src/latency.c 0.00% 2 Missing :warning:
src/module.c 0.00% 2 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## unstable #1294 +/- ## ============================================ + Coverage 70.69% 70.71% +0.01% ============================================ Files 114 115 +1 Lines 63161 63209 +48 ============================================ + Hits 44650 44696 +46 - Misses 18511 18513 +2 ``` | [Files with missing lines](https://app.codecov.io/gh/valkey-io/valkey/pull/1294?dropdown=coverage&src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=valkey-io) | Coverage Δ | | |---|---|---| | [src/blocked.c](https://app.codecov.io/gh/valkey-io/valkey/pull/1294?src=pr&el=tree&filepath=src%2Fblocked.c&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=valkey-io#diff-c3JjL2Jsb2NrZWQuYw==) | `91.90% <100.00%> (ø)` | | | [src/commands.def](https://app.codecov.io/gh/valkey-io/valkey/pull/1294?src=pr&el=tree&filepath=src%2Fcommands.def&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=valkey-io#diff-c3JjL2NvbW1hbmRzLmRlZg==) | `100.00% <ø> (ø)` | | | [src/config.c](https://app.codecov.io/gh/valkey-io/valkey/pull/1294?src=pr&el=tree&filepath=src%2Fconfig.c&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=valkey-io#diff-c3JjL2NvbmZpZy5j) | `78.83% <ø> (ø)` | | | [src/server.h](https://app.codecov.io/gh/valkey-io/valkey/pull/1294?src=pr&el=tree&filepath=src%2Fserver.h&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=valkey-io#diff-c3JjL3NlcnZlci5o) | `100.00% <ø> (ø)` | | | [src/latency.c](https://app.codecov.io/gh/valkey-io/valkey/pull/1294?src=pr&el=tree&filepath=src%2Flatency.c&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=valkey-io#diff-c3JjL2xhdGVuY3kuYw==) | `80.87% <0.00%> (-0.05%)` | :arrow_down: | | [src/module.c](https://app.codecov.io/gh/valkey-io/valkey/pull/1294?src=pr&el=tree&filepath=src%2Fmodule.c&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=valkey-io#diff-c3JjL21vZHVsZS5j) | `9.64% <0.00%> (-0.02%)` | :arrow_down: | | [src/server.c](https://app.codecov.io/gh/valkey-io/valkey/pull/1294?src=pr&el=tree&filepath=src%2Fserver.c&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=valkey-io#diff-c3JjL3NlcnZlci5j) | `87.67% <57.14%> (-0.02%)` | :arrow_down: | | [src/commandlog.c](https://app.codecov.io/gh/valkey-io/valkey/pull/1294?src=pr&el=tree&filepath=src%2Fcommandlog.c&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=valkey-io#diff-c3JjL2NvbW1hbmRsb2cuYw==) | `92.79% <92.79%> (ø)` | | ... and [21 files with indirect coverage changes](https://app.codecov.io/gh/valkey-io/valkey/pull/1294/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=valkey-io)

🚨 Try these New Features:

hwware commented 1 week ago

Just for Note: Let's first refresh the memory in pr https://github.com/valkey-io/valkey/pull/336, the last comment conclusion is:

After a core team meeting, we decided adding a new command COMMANDLOG with subcommands HEAVYTRAFFIC and SLOW, and then slowlog.c can be renamed to a common commandlog.c. https://github.com/valkey-io/valkey/pull/336#issuecomment-2246809112

hwware commented 1 week ago

From my understanding, for the command: commandlog get should be: COMMANDLOG get count slow | heavytraffic-input | heavytraffic-output.

commandlog len should be: COMMANDLOG len slow | heavytraffic-input | heavytraffic-output.

commandlog reset should be: COMMANDLOG reset slow | heavytraffic-input | heavytraffic-output.

Can you describe the terms heavytraffic-input and heavytraffic-output in the json file (argument part) because I can only know them from the source codes and valkey.conf so far?

And I think the existing slowlog commands should be deprecated? If yes, I think you should update the related json files as well.