Open DVN237294 opened 2 months ago
hi - thank you for reporting this and providing all the details! i'll try to set up a repro and see
this is how i try to reproduce the issue, and so far, did not see it. Of course not sure about its frequency either (or if I'm headed in the right direction at all.)
Repro attempt: run a big long SELECT
query + PUT
in the same thread and hope there's a deadlock. PUT
has a ~22MB zipped CSV and executes in a couple seconds.
Tried the below too, with following modifications:
SELECT
+ PUT
, use two PUT
sThreadPool.QueueUserWorkItem
to execute tasks
using System;
using System.Data;
using Snowflake.Data.Client;
using System.Threading;
class Program { // Connection parameters private const string Account = "myorg-myaccount"; private const string Username = "myuser"; private const string Password = "mypassword"; private const string Warehouse = "mywh";
static void Main(string[] args)
{
// Initialize Snowflake connection
string connectionString = $"account={Account};user={Username};password={Password};warehouse={Warehouse};";
using (var connection = new SnowflakeDbConnection())
{
connection.ConnectionString = connectionString;
connection.Open();
// Execute both queries in the same ThreadPool thread
ThreadPool.QueueUserWorkItem(state =>
{
PutQuery1(connection);
//PutQuery2(connection);
SelectQuery(connection);
});
/*
// Execute two queries in separate ThreadPool threads
ThreadPool.QueueUserWorkItem(SelectQuery, connection);
ThreadPool.QueueUserWorkItem(PutQuery, connection);
*/
Thread.Sleep(5000); //
Console.WriteLine("Press any key to exit...");
Console.ReadKey();
}
}
static void SelectQuery(object state)
{
SnowflakeDbConnection connection = (SnowflakeDbConnection)state;
using (var command = connection.CreateCommand())
{
// quite big query, takes some time and returns 15.000.000 rows
command.CommandText = "SELECT * FROM SNOWFLAKE_SAMPLE_DATA.TPCH_SF100.CUSTOMER;";
using (var reader = command.ExecuteReader())
{
int count = 0;
while (reader.Read()) { count++; };
Console.WriteLine($"Thread {Thread.CurrentThread.ManagedThreadId}: Select returned {count} rows.");
}
}
}
static void PutQuery1(object state)
{
SnowflakeDbConnection connection = (SnowflakeDbConnection)state;
using (var command = connection.CreateCommand())
{
command.CommandText = "PUT file://C:\\temp\\building-consents-issued-february-2024.zip @test_db.public.MYSTAGE1 auto_compress=false overwrite=true;";
command.ExecuteNonQuery();
Console.WriteLine($"Thread {Thread.CurrentThread.ManagedThreadId}: PUT query executed successfully.");
}
}
static void PutQuery2(object state)
{
SnowflakeDbConnection connection = (SnowflakeDbConnection)state;
using (var command = connection.CreateCommand())
{
command.CommandText = "PUT file://C:\\temp\\building-consents-issued-february-2024-2.zip @test_db.public.MYSTAGE1 auto_compress=false overwrite=true;";
command.ExecuteNonQuery();
Console.WriteLine($"Thread {Thread.CurrentThread.ManagedThreadId}: PUT query 2 executed successfully.");
}
}
}
Can you advise is this something representing what you're trying to do and see the issue with ? Or if there's a sample minimal viable repro program, which when run, exhibits the issue you're seeing - that would be fantastic and speed up the debugging a lot. Thank you in advance !
Hey, thanks for taking the time to look into this.
I have been trying to debug this for a while, and I'm now at the point where I'm not entirely convinced the lockup is due to this "sync over async" construct. I have a dotnet-dump
memory dump from the application, but my knowledge of TPL and/or default ThreadPool implementation is too limited to narrow the cause of the lockup to prove that this Task.Run(...).Result
construct is at the cause of it. All I know for certain is that the threadpool thread is waiting "indefinitely" (or at least 16+ hours) on the UnwrapPromise
returned by Task.Run
.
I have been trying to get to the bottom of the question "can Task.Run
ever deadlock?". Some say yes. Others (Herman Schoenfeld and stephentoub) says no.
The only logical argument I can see why it would deadlock is if any continuation is put in the ThreadPool thread's local work queue (instead of the global work queue). I just feel like if that was the cause then it should deadlock much faster.
So far, I've only seen one lockup over millions of PUT
queries in the previous months.
However If that was the cause of the deadlock, then I suppose starting the task with TaskCreationOptions.PreferFairness
should alleviate any deadlock.
For now I have decided to ignore this issue until it may appear again, but from my point of view the proposed solutions in this issue still seem relevant: It is generally considered bad practice to do these kinds of "sync over async" workarounds, so it would make sense for this library to work towards avoiding it. If nothing else then just for the increased performance of not having to block threadpool threads.
hi - appreciate all the additional details and digging deeper ! I agree we could look into avoiding the "sync over async" approach in general, and thus this is now an enhancement request for the team to consider (alongside with the 'async support for PUT
) and if deemed feasible, then pick up and prioritize.
Issue description
We are currently using the
PUT file://
API to upload files to an internal stage in Snowflake. Due to current limitations,PUT
s must be done synchronously. We have however seen that calls toExecuteNonQuery
sometimes deadlocks our application. After some investigation I believe that these deadlocks stems from use of "sync over async" constructs in theRestRequester
implementation, here:https://github.com/snowflakedb/snowflake-connector-net/blob/14cf8a5daaf9611b237eb4b51ccdd61cb44761d0/Snowflake.Data/Core/RestRequester.cs#L51-L55
Post
here will execute thePostAsync
method on a ThreadPool thread and block the currently executing thread. However, when the currently executing thread is already a ThreadPool thread there is a risk that the work will be scheduled on the same thread, causing a deadlock.For completeness, here is the code and query I am running:
_uploadFileToStageScript.Contents
containsPUT file://{{0}} @{Stage} AUTO_COMPRESS=false OVERWRITE={{1}};
Suggested solution
The proper solution would be to avoid any "sync over async" constructs and use entirely synchronous methods in the synchronous APIs. Secondarily,
async
support forPUT
commands would be nice.