tikv / pd

Placement driver for TiKV
Apache License 2.0
1.05k stars 721 forks source link

tests/resource_control: enable trace log to help diagnose #8545

Closed HuSharp closed 2 months ago

HuSharp commented 3 months ago

What problem does this PR solve?

Issue Number: Ref #8512

What is changed and how does it work?

Previously, when an error was encountered, we could only get a smaller logs and it was harder to see what went wrong

2024-08-09T10:31:54.0348800Z === RUN   TestResourceManagerClientTestSuite/TestResourceGroupController
2024-08-09T10:31:54.0353345Z [2024/08/09 10:31:15.283 +00:00] [INFO] [controller.go:199] ["load resource controller config"] [config="{\"degraded-mode-wait-duration\":\"1s\",\"ltb-max-wait-duration\":\"30s\",\"ltb-token-rpc-max-delay\":\"1s\",\"request-unit\":{\"read-base-cost\":1,\"read-per-batch-base-cost\":0,\"read-cost-per-byte\":1,\"write-base-cost\":1,\"write-per-batch-base-cost\":0,\"write-cost-per-byte\":1,\"read-cpu-ms-cost\":1},\"enable-controller-trace-log\":\"false\",\"token-rpc-params\":{\"wait-retry-interval\":\"50ms\",\"wait-retry-times\":20}}"] [ru-config="{\"ReadBaseCost\":1,\"ReadPerBatchBaseCost\":0,\"ReadBytesCost\":1,\"WriteBaseCost\":1,\"WritePerBatchBaseCost\":0,\"WriteBytesCost\":1,\"CPUMsCost\":1,\"LTBMaxWaitDuration\":30000000000,\"WaitRetryInterval\":50000000,\"WaitRetryTimes\":20,\"DegradedModeWaitDuration\":1000000000}"]
2024-08-09T10:31:54.0357525Z [2024/08/09 10:31:15.287 +00:00] [INFO] [grpc_service.go:99] ["watch request"] [key=resource_group/settings] [range-end=resource_group/settingt] [start-revision=248]
2024-08-09T10:31:54.0359119Z [2024/08/09 10:31:15.288 +00:00] [INFO] [grpc_service.go:99] ["watch request"] [key=resource_group/controller] [range-end=resource_group/controlles] [start-revision=248]
2024-08-09T10:31:54.0360560Z [2024/08/09 10:31:16.043 +00:00] [INFO] [client.go:157] ["server starts to synchronize with leader"] [server=pd1] [leader=pd2] [request-index=0]
2024-08-09T10:31:54.0362046Z [2024/08/09 10:31:16.044 +00:00] [ERROR] [client.go:162] ["region sync with leader meet error"] [error="[PD:grpc:ErrGRPCRecv]receive response error: rpc error: code = Unavailable desc = server not started"]
2024-08-09T10:31:54.0363789Z [2024/08/09 10:31:16.285 +00:00] [INFO] [controller.go:500] ["[resource group controller] create resource group cost controller"] [name=controller_test]
2024-08-09T10:31:54.0365153Z [2024/08/09 10:31:17.045 +00:00] [INFO] [client.go:157] ["server starts to synchronize with leader"] [server=pd1] [leader=pd2] [request-index=0]
2024-08-09T10:31:54.0366706Z [2024/08/09 10:31:17.045 +00:00] [ERROR] [client.go:162] ["region sync with leader meet error"] [error="[PD:grpc:ErrGRPCRecv]receive response error: rpc error: code = Unavailable desc = server not started"]
2024-08-09T10:31:54.0367812Z     resource_manager_test.go:457: 
2024-08-09T10:31:54.0368853Z            Error Trace:    /home/runner/work/pd/pd/tests/integrations/mcs/resourcemanager/resource_manager_test.go:457
2024-08-09T10:31:54.0369937Z            Error:          "492.521605ms" is not less than or equal to "300ms"
2024-08-09T10:31:54.0370779Z            Test:           TestResourceManagerClientTestSuite/TestResourceGroupController
2024-08-09T10:31:54.0372094Z === RUN   TestResourceManagerClientTestSuite/TestResourceGroupControllerConfigChanged

After that, we can get more logs to help check the test

[2024/08/19 11:35:40.090 +08:00] [INFO] [client.go:157] ["server starts to synchronize with leader"] [server=pd1] [leader=pd2] [request-index=0]
[2024/08/19 11:35:40.091 +08:00] [ERROR] [client.go:162] ["region sync with leader meet error"] [error="[PD:grpc:ErrGRPCRecv]receive response error: rpc error: code = Unavailable desc = server not started"]
[2024/08/19 11:35:40.225 +08:00] [INFO] [controller.go:499] ["[resource group controller] create resource group cost controller"] [name=controller_test]
[2024/08/19 11:35:40.259 +08:00] [INFO] [controller.go:63] ["[resource group controller] update run state"] [name=controller_test] [request-unit-consumption="r_r_u:1450 w_r_u:580 read_bytes:696 write_bytes:551 total_cpu_time_ms:725 kv_read_rpc_count:29 kv_write_rpc_count:29 "] [is-throttled=false]
[2024/08/19 11:35:40.259 +08:00] [INFO] [controller.go:63] ["[resource group controller] update avg ru per sec"] [name=controller_test] [type=RU] [avg-ru-per-sec=29962.399107893918] [is-throttled=false]
[2024/08/19 11:35:40.259 +08:00] [INFO] [controller.go:63] ["[resource group controller] send token bucket request"] [now=2024/08/19 11:35:40.259 +08:00] [req="[{\"resource_group_name\":\"controller_test\",\"Request\":{\"ru_items\":{\"request_r_u\":[{\"value\":156823.19509341655}]}},\"consumption_since_last_request\":{\"r_r_u\":1450,\"w_r_u\":580,\"read_bytes\":696,\"write_bytes\":551,\"total_cpu_time_ms\":725,\"kv_read_rpc_count\":29,\"kv_write_rpc_count\":29}}]"] [source=low_ru]
[2024/08/19 11:35:40.260 +08:00] [INFO] [controller.go:63] ["[resource group controller] token bucket response"] [now=2024/08/19 11:35:40.260 +08:00] [resp="[{\"resource_group_name\":\"controller_test\",\"granted_r_u_tokens\":[{\"granted_tokens\":{\"settings\":{},\"tokens\":156823.19509341655},\"trickle_time_ms\":5000}]}]"] [source=low_ru] [latency=513.25µs]
[2024/08/19 11:35:40.260 +08:00] [INFO] [controller.go:63] ["[resource group controller] update run state"] [name=controller_test] [request-unit-consumption="r_r_u:1450 w_r_u:580 read_bytes:696 write_bytes:551 total_cpu_time_ms:725 kv_read_rpc_count:29 kv_write_rpc_count:29 "] [is-throttled=false]
[2024/08/19 11:35:40.260 +08:00] [INFO] [controller.go:63] ["[resource group controller] before reconfigure"] [name=controller_test] [old-tokens=7970] [old-rate=0] [old-notify-threshold=0] [old-burst=0]
[2024/08/19 11:35:40.260 +08:00] [INFO] [controller.go:63] ["[resource group controller] after reconfigure"] [name=controller_test] [tokens=7970] [rate=31364.63901868331] [notify-threshold=0] [burst=0]
[2024/08/19 11:35:40.345 +08:00] [INFO] [resource_manager_test.go:457] ["finished test case"] [index=0] [sum=3.618337ms] [waitDuration=0s]
[2024/08/19 11:35:41.091 +08:00] [INFO] [client.go:157] ["server starts to synchronize with leader"] [server=pd1] [leader=pd2] [request-index=0]
 enable trace log to help diagnose

Check List

Tests

Release note

None.
ti-chi-bot[bot] commented 3 months ago

@HuSharp: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-integration-realcluster-test f15b156d158ca9e2b08a5ffe0d96a19ccb37bd89 link false /test pull-integration-realcluster-test

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository. I understand the commands that are listed [here](https://go.k8s.io/bot-commands).
codecov[bot] commented 3 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 77.64%. Comparing base (b132ea6) to head (5570562). Report is 6 commits behind head on master.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #8545 +/- ## ========================================== + Coverage 77.56% 77.64% +0.08% ========================================== Files 474 474 Lines 61876 61877 +1 ========================================== + Hits 47995 48047 +52 + Misses 10344 10290 -54 - Partials 3537 3540 +3 ``` | [Flag](https://app.codecov.io/gh/tikv/pd/pull/8545/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=tikv) | Coverage Δ | | |---|---|---| | [unittests](https://app.codecov.io/gh/tikv/pd/pull/8545/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=tikv) | `77.64% <100.00%> (+0.08%)` | :arrow_up: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=tikv#carryforward-flags-in-the-pull-request-comment) to find out more.
ti-chi-bot[bot] commented 2 months ago

[LGTM Timeline notifier]

Timeline:

ti-chi-bot[bot] commented 2 months ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: niubell, nolouch, okJiang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/tikv/pd/blob/master/OWNERS)~~ [nolouch] - ~~[pkg/mcs/resourcemanager/server/OWNERS](https://github.com/tikv/pd/blob/master/pkg/mcs/resourcemanager/server/OWNERS)~~ [niubell] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment