Open shamanthchandra-yb opened 4 weeks ago
For duplicate retryable requests, we call NotifyReplicationFinished for multiple times. I can see at least two of them: First, it's pushed back to the duplicate_rounds, and will be called when the first registered round is replicated
if (!emplace_result.second) {
emplace_result.first->duplicate_rounds.push_back(round);
return false;
}
for (const auto& duplicate : running_it->duplicate_rounds) {
duplicate->NotifyReplicationFinished(status_for_duplicate, leader_term,
nullptr /* applied_op_ids */);
}
The other place is
auto result = state_->AddPendingOperation(round_ptr, OperationMode::kFollower);
if (!result.ok()) {
round_ptr->NotifyReplicationFinished(result, OpId::kUnknownTerm, /* applied_op_ids */ nullptr);
}
Jira Link: DB-12414
Description
Encountered a segmentation fault in the yb-tserver process within the NotifyReplicationFinished function. On bird view, from log, this crash appears to be because of handling of retryable requests, related to the ReplicationFinished() callback being invoked multiple times.
Here is the core backtrace:
These logs were around same timestamp, on that node:
Please find run details in JIRA.
Issue Type
kind/bug
Warning: Please confirm that this issue does not contain any sensitive information