Open razzmatazz opened 2 years ago
It's intended that lock (m_lock)
(in VerifyConnected
) should be very quick; however, I suppose if there were contention and the lightweight .NET CAS locking mechanism got promoted to a full kernel object, this could end up taking much more time than anticipated?
I'm not even sure how likely that would be, since ServerSession
is generally intended to be used by only a single thread, so there shouldn't be contention. Might have to profile and see if I can reproduce.
Are you running on Linux or Windows?
This is on Linux.
Yes.. it could be that slim lock gets promoted to kernel lock due to high CPU usage, but I am no kernel engineer :)
Do you need a profile in some form?
this appears to be lock contention+promotion to kernel lock, like you said, under high cpu load only, no worries and thanks for this project
We're seeing similar problems where ServerSession.VerifyConnected()
shows up as a massive hotspot (Linux containers as well). Happy to send more details.
I'm skeptical that contention is a problem here, not only is the lock local to each session (and in our case we use a single thread per session), but this issue also happens on single-threaded benchmarks where we open a single session.
After forking the client and removing the VerifyConnected
sanity-check, we observed > 5x latency / throughput improvement on the client-side on our benchmarks.
Not exactly sure how to fix this cleanly though. Using a volatile m_state
and not locking upon reading to perform the sanity-checks also fixed the performance degradation, it's debatable whether this is correct though. Tempted to think that locking upon performing sanity-checks doesn't bring much value: this doesn't prevent race-conditions where the session state changes either before/after performing the sanity-check.
this issue also happens on single-threaded benchmarks where we open a single session
Are you able to share this benchmark code?
Are you able to share this benchmark code?
I can't share the exact benchmark code we use as it contains business logic and a whole parametrisation/multi-threading framework, but essentially our benchmark script is a simple scan over the data using connection#ExecuteReaderAsync
, here's a rough example:
DbConnection connection;
var createTableQuery = "CREATE TABLE benchmarkTable(col1 BINARY(16) NOT NULL, col2 BINARY(16) NOT NULL, col3 BINARY(16) NOT NULL);"
await connection.ExecuteAsync(createTableQuery);
// Populate the benchmarkTable, 1 - 10 millions rows for example
// We skip this step here, in our case we use optimized CSV imports
// Perform a simple scan over the benchmarkTable
// Note also that we use infiniteCommandTimeout to bypass `TimerQueue#Add` 's lock,
// which creates high contention on multi-threaded reads, but that's a separate problem
var selectQuery = "SELECT col1, col2, col3 FROM benchmarkTable;"
await using var dbDataReader = await connection.ExecuteReaderAsync(selectQuery, commandTimeout: 0);
while (await dbDataReader.ReadAsync()) {
// read the fields to imitate production workload
// this is optional though because data stream seems to get processed upon ReadAsync already
byte[] buffer = new byte[16];
for (int i = 0; i < 3; i++)
dbDataReader.GetBytes(i, 0, buffer, 0, 16);
}
More information about our benchmarking setup:
mcr.microsoft.com/dotnet/aspnet:8.0
. The underlying OS should be Debian 12
(cf https://github.com/dotnet/dotnet-docker/blob/main/README.aspnet.md)MySqlConnector
, but the lock logic doesn't seem to have changed in CreateExceptionForInvalidState
(which replaced VerifyConnected
), so I'd expect this issue to be reproducible on other types of DBs . One important element of our setup is that our benchmarks were very far from saturating the DB, the throughput was mainly bottlenecked client-side.Still investigating. At this point, not 100% sure that the lock is the only problem, we suspect impact of dotnet versions on the original Singlestore connector release we were using - rebuilding the original code locally (without the lock-free fix) yields the same improved performance as the lock-free implementation, so we suspect dotnet differences between our local build and their published builds.
Details to follow, this issue may be specific to the Singlestore fork's builds, in which case I apologize for the noise.
Software versions MySqlConnector version: 2.1.0 Server type (MySQL, MariaDB, Aurora, etc.) and version: GCP/mysql8 .NET version: 6.0 ORM NuGet packages and versions: NHibernate/5.3
Describe the bug When profiling a high-cpu background processing server I am seeing a lot of threads waiting on VerifyConnected(). I am not sure why this is happening as VerifyConnected should not block often, from reading the source code, but it does for me.
Exception Full exception message and call stack (if applicable)
Example trace from one of the threads via
dotnet stack report -p1
Expected behavior It should not spend a lot of time in VerifyConnected
Additional context I am not sure what is at fault here, maybe profiler is showing incorrect data or NHibernate is doing something funny with ADO.NET connections (i.e. why is there contention in VeryConnected at all?!)
Need pointers how to debug this further.
BTW, thanks for the software!