pingcap / tiflash

The analytical engine for TiDB and TiDB Cloud. Try free: https://tidbcloud.com/free-trial
https://docs.pingcap.com/tidb/stable/tiflash-overview
Apache License 2.0
937 stars 409 forks source link

tests: Fix MPPTask-Moniter may live longer than TiFlashMetrics #9096

Closed JaySon-Huang closed 1 month ago

JaySon-Huang commented 1 month ago

What problem does this PR solve?

Issue Number: close https://github.com/pingcap/tiflash/issues/9092, close https://github.com/pingcap/tiflash/issues/9097

Problem Summary:

For #9092

TMTContext will start a thread "MPPTask-Moniter" for running checkLongLiveMPPTasks. https://github.com/pingcap/tiflash/blob/38ab3f912220b94962043bde7cc341017d569994/dbms/src/Storages/KVStore/TMTContext.cpp#L121-L126

When the TiFlash is shutting down, the thread is not explicitly stopped. And the thread may live longer than the TiFlashMetrics instance. If the TiFlashMetrics instance is released before checkLongLiveMPPTasks run, then checkLongLiveMPPTasks will access to a random address and cause use-after-free data race when shutting down.

For #9097 Seems the race is reported in backtrace-rs, there is nothing we can do in tiflash code, just ignore

What is changed and how it works?

For #9092 In TMTContext::shutdown, set the MPPTaskMonitor->is_shutdown = true. So the thread is expected to be stopped after TMTContext::shutdown is called and before TiFlashMetrics is release. And when monitor->is_shutdown == true, the thread don't report the metircs to TiFlashMetrics

For #9097 Add race:StackTrace::toString, race:DB::SyncPointCtl::sync to tsan.suppression. And they will be ignored when running with TSAN_OPTIONS="suppressions=/tests/sanitize/tsan.suppression" ./dbms/gtests_dbms --gtest_filter=...

Check List

Tests

./dbms/gtests_dbms

- [ ] No code

Side effects

- [ ] Performance regression: Consumes more CPU
- [ ] Performance regression: Consumes more Memory
- [ ] Breaking backward compatibility

Documentation

- [ ] Affects user behaviors
- [ ] Contains syntax changes
- [ ] Contains variable changes
- [ ] Contains experimental features
- [ ] Changes MySQL compatibility

### Release note

<!-- bugfix or new feature needs a release note -->

```release-note
None
ti-chi-bot[bot] commented 1 month ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Lloyd-Pottiger, xzhangxian1008

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/pingcap/tiflash/blob/master/OWNERS)~~ [Lloyd-Pottiger] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
ti-chi-bot[bot] commented 1 month ago

[LGTM Timeline notifier]

Timeline: