Open shishir2001-yb opened 3 months ago
Please don't bundle separate issues into the same GH issue.
Jira Link: DB-12562
Description
This test was kinda flaky but post this commit it started failing continuously in alma8-clang17-tsan.
LoadBalancerTablegroupsTest.GlobalLoadBalancingWithTablegroups: Analyze Trends
../../src/yb/integration-tests/yb_table_test_base.cc:375 Failed Bad status: Timed out (yb/util/backoff_waiter.cc:78): Operation 'IsLoadBalancerIdle' didn't complete within 60001ms
Please put in the extra effort to give accurate information on which commits. The internal detective link says
15786f34 fail
924dada7 fail
9e7181f9 fail
f69b08ff pass
Before that point, it is flaky failing, and after that point, it fails >90% (until 4d922ca5, from which it appears to be fine, so this is no longer relevant). It looks like you point out efd4cb7fea876ed9c13d9ce94ea989ed52aeaf69 without putting in much effort to make sure that is actually the commit introducing the issue. It just happens to be the next commit right after f69b08ff. If you are not sure which commit is the issue, then you should give a commit range of where the issue likely appeared (i.e. f69b08ff..15786f34) and avoid assigning potentially unrelated people. Or better, put in the effort to track down the right commit.
If you look at the internal detective page for my commit, it doesn't fail that test. Same for other pages such as https://detective.dev.yugabyte.com/D37179/ and https://detective.dev.yugabyte.com/D34898/. https://detective.dev.yugabyte.com/D36510/ shows it failing, so suspicion is that 15786f34 caused the issue. But it doesn't matter anymore since it is not failing anymore.
It looks like it got fixed by ead90cc4002454a028e9e4d517afc0d4f91b4341 which lines up with detective (it's close to 4d922ca5). It isn't explicitly mentioned in the summary, but the code changes LoadBalancerColocatedTablesTest which is a parent class of LoadBalancerTablegroupsTest.
This test is failing in alma8-clang17-tsan but I couldn’t find the offending commit. PgIndexBackfillTest.PgStatProgressCreateIndexMultiNode: Analyze Trends
Bad status: Timed out (yb/util/backoff_waiter.cc:78): Operation 'Wait on index progress columns phase' didn't complete within 30171ms
According to internal detective, the first triple-consecutive-failure appeared ending in 5148b31fd3. Close to that point is 15786f3494, identical to the above mentioned likely failure cause.
15786f34 fail
7bbcf875 fail
cda94139 fail
15786f34 pass
It is still flaky failing today with the same issue. Likely, a point-fix is needed similar to what was done in ead90cc4002454a028e9e4d517afc0d4f91b4341.
Jira Link: DB-12562
Description
This test was kinda flaky but post this commit it started failing continuously in alma8-clang17-tsan.
LoadBalancerTablegroupsTest.GlobalLoadBalancingWithTablegroups: Analyze Trends
This test is failing in alma8-clang17-tsan but I couldn’t find the offending commit. PgIndexBackfillTest.PgStatProgressCreateIndexMultiNode: Analyze Trends
Issue Type
kind/bug
Warning: Please confirm that this issue does not contain any sensitive information