Closed bt-ndollimount closed 1 week ago
We don't manage any connections directly in PSU and it's a side effect of either hangfire or entity framework, so we need to determine which one of those is misbehaving. I can't imagine the need for over 100 connections at any time. Can you check the pg_stat_activity table to see what these connections are doing?
Here's some discussion about a similar issue but PSU uses a later version of what is being discussed here.
https://github.com/hangfire-postgres/Hangfire.PostgreSql/issues/163
I've been keeping an eye on the pg_stat_activity since it last occurred but it hasn't occurred since. I'm hoping to at least catch which node it's coming from, if it's one specific. The last part of that discussion you linked advises it's fixed in Npgsql 7 as well as 6.0.10; is PSU using a newer version than those?
Yeah we are using 8.0.0.
Just got in to check the connections (I made a small dashboard for it). It was low at 22 total connections until I started clicking around in the main UI to check some jobs and the interface locked up. I glanced over and quickly got a screenshot of it showing a total of 100 connections. The bulk is coming from the Linux node that only runs the UI and a few apps, most of which aren't even in use yet. The 3 Windows nodes that are the job nodes don't appear to really be contributing much at all to the connections. Would it be worth me sending in the logs from the web node?
Sure. It might be helpful. I'll see if I can reproduce this.
Should I open a support case to send in the logs?
@adamdriscoll I seem to be able to reproduce this now on demand. It's when I open up the job details for any job, I can then watch the database connections rise until it dies with the 'too many connections' error.
Thanks. I haven't had a chance to try this yet but should have time later today or tomorrow morning. I'll let you know if I have any luck and, if not, I'll have you open a case.
@adamdriscoll To clarify this a bit more, it seems to only be when checking currently running jobs. It doesn't appear to make a difference whether there is a write-progress in use or not.
@adamdriscoll Another update from this morning. I accidentally opened a job details and the issue appeared. Instead of restarting PSU, I just closed the tab I had opened it in and it did recover. I wasn't paying attention to how long it took but it was for sure less than 7 minutes (I messaged someone to let them know when I had done it so I had a timestamp).
@bt-ndollimount - I was just looking at this. The problem is that we have a timer that is updating the UI with new job data (output etc) as it's received. It reloads on an interval but if that interval doesn't load the data fast enough, it will begin run the same query again. This puts even more pressure on the same table\rows and eventually exhausts all connections in the pool.
I wasn't seeing this myself because the query was returning fast enough but once the job tables had enough data in it, this would start to happen. We have this pattern in a couple of other places besides the job page so fixing those as well.
Description of Issue
We've been running into an issue where the logs are showing issues connecting to Postgres with the error, 'sorry, too many connections'. Our connection limit has already been increased to 300 as we noticed we couldn't start multiple nodes at the same time without running into that same error. We have a few custom scripts and apps that communicate directly with the same PostgreSQL instance but I've confirmed they all close out their connection after running their queries.
PSU might not be closing connections to PostgreSQL properly.
Something else to consider, as noted above where we weren't able to start multiple nodes at the same time because one node was taking up almost all the allowed connections of the default 100 limit; is that necessary? Could it not make a connection to the database and use that one connection or does it need to make that many individual connections?
Version
5.0.15
Severity
High
Hosting Method
MSI (Windows Service)
Operating System
Windows
Database
PostgreSQL
Licensed
Yes
Features
Git Sync
Additional Environment data
Windows Server 2022 Standard / Amazon Linux PowerShell 7.4.6 PostgreSQL 17
Screenshots/Animations