Closed haydn closed 3 months ago
I still don't know what the actual problem is, but I've been forced to bypass the pooler altogether and connect directly to the database. That has completely solved the error we were seeing in our app which makes me think there's an underlying issue in Supavisor causing this.
Yes I am having the exact same issue that started popping up two days ago with my python app. I am using
sqlalchemy = "2.0.30"
asyncpg = "0.29.0"
The connection is established and then dies with asyncpg.exceptions.ConnectionDoesNotExistError: connection was closed in the middle of operation.
the connection is successfully established infrequently which has made the problem quite difficult to track down
@abc3 Have you had a chance to investigate this at all? We're using still using a direct connection to postgres at the moment (because it's more reliable than Supavisor), but our app is barely scraping by with that set-up because of the connection limit.
We're also experiencing a similar issue:
"ClientHandler: socket closed with reason {:timeout, {:partisan_gen_statem, :call, [#PID<50041.23646.8099>, {:db_call, #PID<0.1508.1393>, <<80, 0, 0, 3, 116, 0, 105, 110, 115, 101, 114, 116, 32, 105, 110, 116, 111, 32, 34, 109, 101, 115, 115, 97, 103, 101, 95, 103, 101, 110, 101, 114, 97, 116, 105, 111, 110, 34, 32, 40, ...>>}, 15000]}}, DbHandler {#PID<50041.23646.8099>, {:error, {:exit, {:timeout, {:partisan_gen_statem, :call, [#PID<50041.23646.8099>, :get_state_and_mode, 5000]}}}}}"
When clients try to connect, they get this error:
write CONNECTION_CLOSED aws-0-us-west-1.pooler.supabase.com:6543
We upgraded to the most expensive tier already, optimized our db, and added a caching layer, and we still get these errors. We also deployed Supabase Grafana and looked at our stats, and much of the activity had leeway.
@haydn @SkycladObserver I'm looking for a fix, sorry about that
Hello @abc3, also getting this write CONNECTION_CLOSED
issue very frequently on my python backend. My query is very optimized (uses index scan) already. I've tried using the python supabase client and my own psycopg connection pool already.
This is disrupting our day to day operations. We're hoping this gets fixed ASAP. Thank you
By the way, could you provide a small repo with steps on how to reproduce the issue? Did you use session or transaction mode? Are there queries with prepared statements? That would be very helpful to me
Sure thing. I can provide a code snippet.
from supabase import create_client
@application.route("/myendpoint", methods=["POST"])
def myendpoint():
try:
data = request.get_json()
db_client = supabase_client()
except Exception as e:
return jsonify({"status": "Failed", "error": repr(e)})
try:
output = do_something(data)
finally:
update_db(data, db_client)
return jsonify(output)
def supabase_client(url=None, key=None):
if url is None:
url = os.getenv("SUPABASE_URL")
if key is None:
key = os.getenv("SUPABASE_SERVICE_ROLE_KEY")
return create_client(url, key)
The update_db function is a simple update set
query that uses an index scan.
The function do_something
can take from 5 to 30s to finish. The behavior is unpredictable. The connection closed can happen for short or long request durations.
Im also using transaction mode. Our plans allow 500 direct connections and 120_000 supavisor connections. On grafana we only see under 70 connections both direct and supavisor. Our cpu usage is also very low.
I can give code snippet later, but it happens fairly often in transaction mode where inserts are quite frequent. For my case I'm using vercel serverless with Drizzle ORM that uses the node postgres driver: https://github.com/porsager/postgres. I also turned off prepared statements
@abc3 here's some sample code:
const queryClient = postgres(env.DATABASE_URL, {
prepare: false,
});
const db = drizzle(queryClient, {
schema,
logger: process.env.NODE_ENV === "development",
});
async function sample() {
const record = await insertInitialRecord();
try {
const externalData = await getDataFromExternalServer();
await updateRecordWithExternalData(externalData);
} catch(e) {
await updateRecordStatusToFailed(record.id);
throw e;
}
}
Basically I have a function that inserts an initial record into the db. It gets data from an external server (can take from 3s - 10s) then updates the db record with the data. If the external server returns an error, update the record's status to failed. Note that these inserts/updates aren't in the same transaction because I need to insert the initial record first.
Note: idk if this affects anything, but we're also using supabase realtime.
thanks guys ❤️
Can you please check your connections again?
@abc3 It's good now!!! Thank you, our savior 🙏 Very very fast response <3
@abc3 I'm curious, what was the problem? Was it code or configuration issue?
works on my end as well. thank you @abc3! I'd like to know the cause of the issue too if possible?
@abc3 Unfortuantely, we're still seeing this issue intermittently (4 out of the 13 requests hit this issue):
The errors in the Vercel logs look like this (it's a Next.js app, this is a node-postgres error):
Error: Client has encountered a connection error and is not queryable
at /var/task/node_modules/.pnpm/pg@8.7.1/node_modules/pg/lib/client.js:563:27
at process.processTicksAndRejections (node:internal/process/task_queues:77:11)
These all line-up with "Terminating with reason" errors in the Pooler logs:
This issue only appears to be occurring on pages where there's a significant amount of data getting returned by the queries. For example, here's the same page for a different user where there's very little data being returned from the database:
Again, bypassing the Supavisor and connecting directly to Postgres eliminates this issue completely (but introduces the problem of running out of connections 😟).
By the way, could you provide a small repo with steps on how to reproduce the issue? Did you use session or transaction mode? Are there queries with prepared statements?
@haydn think we may have handled this one. Let us know please!
Currently deployed only to Supavisor in ap-southeast-2
@chasers @abc3 You're my heroes! I'm not seeing any errors anymore:
I'll monitor it for the rest of today and post back here this afternoon if it's still looking okay.
It's worked flawlessly all day. I've reconnected our production app to the pooler as well — it also hasn't had any issues. Calling this fixed. Thanks for you're help!
@chasers we started to have issues about 24 hours ago in ap-southeast-2. Lots of connections terminated and dbhandler exited errors. This seems related to that release.
@blechatellier can you submit a support ticket so we can get a project id please?
See the issue here and it is stemming from expected SASL response, got message type 83
@blechatellier fix should be rolling out shortly
@blechatellier can you please check again?
@chasers @abc3 thanks, I will monitor for the next few hours but so far no errors since 06:22:38.82 AEST. What was causing the issue?
@chasers I'm seeing similar errors on aws-0-eu-central-1.pooler.supabase.com
, this one is from half an hour ago:
Not sure how serious it is but I also see a lot of my queries failing without any specific error and I'm still trying to figure out the root cause. If I use the direct connection all my queries are successful. Could this be related? I just sent a support ticket as well.
I'm seeing the same issue as daniel-xyz. I get 500s without any specific error response.
What was causing the issue?
@blechatellier the incoming data were not handled properly when spinning up a new pool
@daniel-xyz @Cali93 hey, did you submit a support ticket?
@SkycladObserver @sheannethedev there was a misconfiguration on one of the cluster nodes in your region
@daniel-xyz @Cali93 hey, did you submit a support ticket?
Yes, just saw your mail. :) Will test if the errors occur in session mode as soon as I'm back to work.
Bug report
Describe the bug
In the "Pooler" logs in the Supabase dashboard I'm seeing errors like this:
I don't understand what this trying to tell me. Something has timed-out?
In our app we're seeing this as "Connection terminated unexpectedly" errors being emitted by node-postgres.
To Reproduce
I have not idea how to reproduce this reliably because I don't understand what the error is. I would love to reproduce it.
Expected behavior
No error.
Also one/both of these:
Screenshots
https://github.com/supabase/supavisor/issues/344#issuecomment-2157338747
System information
N/A
Additional context
I opened a support ticket about this 2 days ago, but I'm still waiting for a response from the "pooler team".Response from the pooler team:
Related discussion:
https://github.com/orgs/supabase/discussions/26879#discussion-6750486