Closed kevinpoitra closed 4 years ago
It seems like the problem would have to be with Diesel rather than r2d2. If diesel fails to connect to the database, there's nothing for the pool to leak.
Thanks for the quick response!
You're right - I guess I was thrown off as my logs show r2d2 as the module logging these errors, like so:
ERROR r2d2 - could not create socket: Too many open files
When in reality, r2d2 just catches Errs when it tries to connect and prints them out on its own, here. Should've looked a bit more into it!
Closing this as I've moved the issue over to Diesel's repo.
To preface: I'm using r2d2 within a web service, running on a freshly installed CentOS 8 server. r2d2 is imported into my project via Diesel's
r2d2
feature. The databases that r2d2 talks to are on separate servers, and share a network.Recently, one of my services suffered from a network outage that affected the network between the program that uses r2d2 and the database. When the network came back online, the program still wasn't able to respond to requests; its log was showing many
Too many open files
errors as it had hit the default soft limit for open files on the system (1024), causing all further network requests to also throw similar 'too many open file' errors until I manually restarted the process.From my testing, any failed connection within r2d2 causes this behavior. To test, I first got an idea of what the program looks like with a working network:
ps aux
lsof -p <pid> | wc -l
. My program in particular starts with 90 open files.Then, I severed the connection between the program and a PostgreSQL database that it talks to. Anything that causes r2d2 to throw errors seems to work; in my case I removed PostgreSQL's entry from the database server's FirewallD zone. This made the program immediately start logging "No route to host" errors from r2d2.
Using that same
lsof
command from above every few minutes, I monitored the open file count, and it seems like it goes up once per failed attempt, starting from 92 open files right when the firewall was modified:Inspecting the output of
lsof
at this point shows many entries that are near identical to the following:At this point, I re-added PostgreSQL's port to the firewall, which allowed r2d2 to create connections successfully. I ran the
lsof
command again after waiting 20 minutes, and it showed that the count hadn't changed from 627. From what I saw after that network outage, r2d2 will never release those open files - I have about six hours worth of logs of nothing but r2d2 throwing 'too many open files' errors every time it'd try to acquire a new connection every few seconds, wherein the network came back online about 2 hours into those logs.Some additional potentially helpful information:
Pool<ConnectionManager<PgConnection>>
, wherePgConnection
comes from Diesel itself.