Closed myang2021 closed 12 months ago
The function GetActiveTransactionsList
is newly introduced by the following commit
d2a700a3a35edc3a0e74a5ee72d36e9cac7319c1
Author: Karthik Ramanathan <kramanathan@yugabyte.com>
Date: Wed Aug 2 17:30:08 2023 -0700
[#17087, #18054] YSQL: Exposes YB Transaction IDs in Postgres.
It is mentioned in the commit summary:
This change introduces a new RPC `GetActiveTransactionsList` for this purpose.
decltype(sessions_) sessions_snapshot;
{
std::lock_guard lock(mutex_);
sessions_snapshot = sessions_;
}
The purpose looks like we try to make a snapshot of the current sessions, but end up making a copy of references to those sessions in the sessions_
. If we can truly make a copy of clones to those sessions, then we will not have deadlock.
boost::multi_index_container<
LockablePgClientSessionPtr,
boost::multi_index::indexed_by<
boost::multi_index::hashed_unique<
boost::multi_index::const_mem_fun<PgClientSession, uint64_t, &PgClientSession::id>
>
>
> sessions_ GUARDED_BY(mutex_);
Note LockablePgClientSessionPtr
is a pointer:
using LockablePgClientSessionPtr = std::shared_ptr<LockablePgClientSession>;
But how can we safely make a clone of LockablePgClientSession
when its mutex_ is already locked? By lock semantics if an object is locked, it is not readable because it may be undergoing a critical change that needs to be atomic and that's why a lock is needed. So the sessions_snapshot
did not work as expected and I don't see how to make it work as expected.
I think we can fix the bug by
for (const auto& session : sessions_snapshot) {
AddTransactionInfo(resp, session);
}
This assumes that AddTransactionInfo
will only access the immutable (read-only) part of session
. It does look so:
void AddTransactionInfo(
PgGetActiveTransactionListResponsePB* out, const PgClientSessionLocker& locker) {
auto& session = *locker;
const auto* txn_id = session.GetTransactionId();
if (!txn_id) {
return;
}
auto& entry = *out->add_entries();
entry.set_session_id(session.id());
txn_id->AsSlice().CopyToBuffer(entry.mutable_txn_id());
}
We can see that session.GetTransactionId();
reads a pointer back, I think it is safe without locking mutex_ of session.
Thinking more, it's tricky and hard to ensure safety without locking mutex_. We can consider adding an active_txns_
and each time we add a new txn to a session, we add also insert it there. Any time we disassociate a txn with a session, we remove it from there. active_txns_
will have its own mutex, in this way we avoid the deadlock.
Forget the deadlock: GetActiveTransactionList taking locks makes pg_stat_activity unusably slow. This kind of information could be pre-populated close to pgstat code instead of pgstat having to go all the way to the pggate layer to get the information on the fly. Worst is when a CREATE INDEX is running and someone selects from pg_stat_activity: that's going to hang for a while.
For instance, master flag --TEST_simulate_slow_table_create_secs=20
plus CREATE TABLE t (i int);
simultaneously with select * from pg_stat_activity
hangs for up to 20s waiting on that. A more realistic example not using that flag is multiple sessions running lock-taking DDLs, where each will cause around 0.5-1s wait due to RPC latency, and multiply that by, say 10 sessions doing DDLs out of a total of 300 sessions, means pg_stat_activity takes 5-10s--too long.
No others backports pending right?
No more backports left, all done! 👍
Jira Link: DB-7616
Description
When I ran the test
PgIndexBackfillTest.PgStatProgressCreateIndexPhase
via commandThe test passed in master branch (commit dfc08e86469eca5a4cb9641635770f977641e9db) which does not have per-database catalog version mode. However, when I ran the test with per-database catalog version mode turned on by default: change the default value of
--FLAGS_TEST_enable_db_catalog_version_mode
to true, then the above test failed.After debugging, I found there is a potential deadlock scenario. Following are the detailed steps that lead to the deadlock:
(1)
The tablet server (ts-1) runs
WaitForBackendsCatalogVersion
(called by PGDefineIndex
function as part of create index workflow). There is a macro defined for this methodNote that this macro has
GetSession(req)
, which is defined asIn particular, it returns a
PgClientSessionLocker
object.The constructor of
PgClientSessionLocker
automatically locks thelockable
, in this case is the session object.Secondly, the session object will be locked until
method(req, resp, context)
completes.method
in this case isSo we know that
PgClientSession
is now locked and will not be released until its methodWaitForBackendsCatalogVersion
completes.(2) Now let's consider how
client().WaitForYsqlBackendsCatalogVersion
is implemented and we will see the deadlock.So it is sending
WaitForYsqlBackendsCatalogVersion
RPC to master. At master side:It basically calls
YsqlBackendsManager::WaitForYsqlBackendsCatalogVersion
, where aBackendsCatalogVersionJob
is created:(3) The
BackendsCatalogVersionJob
job is started viaBackendsCatalogVersionJob::Launch
, which in turn launches a task for each tablet server:BackendsCatalogVersionJob::LaunchTS
starts aBackendsCatalogVersionTS
task for every tablet server. This of course, includes ts-1 that we mentioned in step (1).The
BackendsCatalogVersionTS
followsRetryingTSRpcTask
work flow:It sends out RPC to a tablet server:
Note
WaitForYsqlBackendsCatalogVersionAsync
is called to send aWaitForYsqlBackendsCatalogVersion
RPC to ts-1.(4) Now back in ts-1:
How ts-1 handles
WaitForYsqlBackendsCatalogVersion
?It builds a query:
Then a connection is made to its local PG master to start a new PG backend.
Next,
pg_stat_activity
is defined as a view which involves functionpg_stat_get_activity
:It was found that function
pg_stat_get_activity
is not always called, depending on the other JOIN clauses defined above. But ifpg_stat_get_activity
is called, then we have a problem. Insidepg_stat_get_activity
it callsWe can see eventually it makes a RPC
GetActiveTransactionList
back to ts-1 itself.(5) In
GetActiveTransactionList
The call
PgClientSessionLocker(session)
makes aPgClientSessionLocker
object, the constructor ofPgClientSessionLocker
automatically locks thelockable
, in this case is the same session object that has already been locked above in step (1). The deadlock!Warning: Please confirm that this issue does not contain any sensitive information