Closed mxcl closed 3 years ago
Hello @mxcl,i meet similar issues recently,but i am using sqlite.
can you make it more detail about "checking connection",if we need reopen connection if is closed?
@tanner0101 bump
using sqlite
Well this makes it seem like a more fundamental issue, how can you lose connections to a local db?
Or it's a coincidence and not the same underlying issue.
Or sqlite connections operate the same.
I believe a solution would be: checking if the connection is still open before using it
@mxcl from what I can tell in Vapor 4 and 3 we already check if a connection is closed before attempting to use it:
Vapor 3, however, caches the connection requested from the pool for the lifetime of the request. Maybe that's a problem? Vapor 4 requests and releases a connection from the pool each time a query is run.
Perhaps we need to look into automatic retry functionality in Fluent if we detect a connection has failed due to networking errors?
Vapor 3, however, caches the connection requested from the pool for the lifetime of the request. Maybe that's a problem?
From my experience of seeing this (locally where it happens much more often), typically it is happening when I have gone to lunch and come back, and then it happens with the first request I make, I'll observe more thoroughly next time it happens to try and provide more info.
Vapor 4 requests and releases a connection from the pool each time a query is run.
I look forward to trying vapor 4 out, though sadly, since this is a professional project, it may be some time before we can afford the port. Still, sounds like the change in behavior may help.
typically it is happening when I have gone to lunch and come back, and then it happens with the first request I make, I'll observe more thoroughly next time it happens to try and provide more info.
Hmm... that would definitely seem like the connection pool is not correctly noticing the connection is closed then. It could be an issue with how connection.isClosed
is determined in v3.
The other thing that might support this is that you're getting Connection reset by peer
. In the Postgres code it's checking isClosed
before sending Postgres messages over the wire: https://github.com/vapor/postgres-kit/blob/1/Sources/PostgreSQL/Connection/PostgreSQLConnection.swift#L79-L81
If isClosed
is true, then you should be getting this error:
PostgreSQLError(identifier: "closed", reason: "Connection is closed.")
isClosed
is being determined by channel.closeFuture
: https://github.com/vapor/postgres-kit/blob/1/Sources/PostgreSQL/Connection/PostgreSQLConnection.swift#L42-L47
In v4, we calculate this with !channel.isActive
. Maybe backporting that to v3 would help?
I too am using Vapor 3 in a production environment and running into many of these same errors. We are using MySQL. I will be digging into this further to get more details.
Unrelated to these events, we also recently started experimenting with running a db proxy in front of a cluster of dbs. When introducing the proxy we kept seeing migrations fail due to the statements being closed, not connections. I don't know if this is related but the information may help. I have a screen shot of a packet capture during migrations, in a single packet there is a request to both execute the statement and close the statement. It appears that something within the future is causing the statements to close before the response from the db, and when the db responds vapor crashes.
I feel like these are related, and the original issue here may not be a dropped connection from the db, but instead a closed statement before receiving the response. I hope I'm not too off point in this thread!
@mdavies-solsys is that packet capture between Vapor and the DB proxy or the proxy and the DB? I don't see anywhere in code that we're sending a close statement message. We specifically re-use the "unnamed" statement (""
) that shouldn't need to be closed.
@tanner0101 This was between vapor and the proxy. I'll let @gazza1davies chime in here as he actually captured the packets. Upon first glance it seemed that perhaps both commands having the same packet number was only honoring the first in MySQL while the proxy honored both. We'd love to be of more assistance if possible! The proxy we're using is ProxySQL.
I experience this issue on regular basis during development. I have postgres db running on remote server and Vapor app running on my mac. Average case is - i build and run my project from xcode, leave it for an hour or two, come back and then every request to the server routes failed with http status 500 read(descriptor:pointer:size:) failed: Connection reset by peer
. Routes not involved to database querying just works.
I should move this app to production soon (in next three days).
What should i do to eliminate this issue?
I tried rewrite all parts database related code and wrap all query
, save
and delete
with req.withPooledConnection(to: .psql) { con in
but it does not have any effect.
@MadGeorge Exactly same on my case. Did you solve this problem?
@mohok hi. no, i did not solve this problem for dev environment. But i can confirm the issue does not reproduced for production. I containerised vapor via docker swarm (tow linked containers vapor+postgresql on the same host) and Connection reset by peer
does not happen.
On my mac this i still face this every 1-3 hours by the way.
@mohok, @tanner0101
i was wrong. today i found many Connection reset by peer
errors in production environment. The only reason i though it was gone, my server continue to work somehow and reconnect to db and may response for requests after error, when in local environment all requests will fail if error Connection reset by peer
will raised until server restart.
I've tried to recreate this a couple of times now but I haven't been able to. Here's what I did:
1: Clone vapor/api-template and swap from SQLite to Postgres 2: Boot up a Postgres DB using this docker-compose file. 3: Insert a Todo. Make ~50 requests to get all todos. 4: Wait 1-3 hours 5: Make more requests.
I expected to see this failure when making more requests after waiting, but it always seems to work. Any additional tips on what I can do to recreate this?
@tanner0101 thank you for the answer.
3: Insert a Todo. Make ~50 requests to get all todos.
It does not look like exception depends on requests count. Regular case is very similar @mxcl described:
I have gone to lunch and come back, and then it happens with the first request I make.
Time between attempts to make request fail is not exact. Some time i need to leave server without requests for an hour, but yesterday i faced read(descriptor:pointer:size:) failed: Connection reset by peer (errno: 104)
in very short interval between successful connections. By feel it needs to leave launched server without requests from 30 to 60 mins and then first requests with db access will rase error.
My current application is mobile api backend. The very first client request is POST /api/notifications/token
and GET /api/categories
. Order of this queries does not specified. The first one (../token) failed more often, since it first, but sometimes they both are failed.
POST /api/notifications/token
handled via this controller:
final class NotificationsCntrl {
func createOrUpdate(_ req: Request) throws -> Future<DeviceToken> {
try req.content.decode(DeviceTokenReq.self).flatMap { toSave in
req.withPooledConnection(to: .psql) { con in
DeviceToken.query(on: con)
.filter(\.deviceID, .equal, toSave.deviceID)
.first()
.flatMap { found in
if let found = found {
return found.update(with: toSave).save(on: con)
}
return DeviceToken(req: toSave).save(on: con)
}
}
}
}
}
and GET /api/categories
looks like:
final class CategoriesCntrl {
func all(_ req: Request) throws -> Future<[Category]> {
req.withPooledConnection(to: .psql) { con in
let productIDs = ProductOption.query(on: con)
.filter(\.stock, .greaterThan, 0)
.all()
.map { $0.map { $0.productID }}
let categoriesIDs = productIDs.flatMap { ids in
CategoryProduct.query(on: con)
.filter(\.productID ~~ ids)
.all()
.map { $0.map { $0.categoryID }}
}
return categoriesIDs.flatMap { ids in
Category.query(on: con)
.filter(\.id ~~ ids)
.all()
}
}
}
}
Before your answer i used image: postgres
in my compose file, which leads to postgresql v 12. Today i migrated to image: postgres:11
to be sure there are no affect of PG version difference. Errors still happened.
I can confirm - there are no difference if my requests are sent to local app (iMac Xcode 11.4.1) or to production server (Ubuntu server with docker swarm). Both will raise similar error from time to time.
The only thing i did not tried is to not use not dockerised postgres.
I tried wrap req.withPooledConnection(to: .psql)
like so
class Wrap {
private var count = 0
func pooledConnection<T>(for req: Request, closure: @escaping (PostgreSQLConnection) throws -> Future<T>) -> Future<T> {
return req.withPooledConnection(to: .psql) { con in
do {
return try closure(con)
} catch {
if self.count > 2 {
return req.eventLoop.newFailedFuture(error: error)
}
(try? req.make(Logger.self))?.info("Error: count: \(self.count): \(error.localizedDescription)")
(try? req.make(Logger.self))?.report(error: error, request: req, verbose: false)
self.count += 1
return self.pooledConnection(for: req, closure: closure)
}
}
}
}
and then use
Wrap().pooledConnection(for req) { con in
But it does not looks like i ever may intercept target error. Either i did it wrong or error raised at the deeper level either it may be intercepted under the hood and sent to response via newFailedFuture
.
If it will help, i also attaching my error middleware class. The notable thing is, discussed error is subject of default
case in ErrorBodyMiddleware.default switch error
statement, so it is not the AbortDataError, or AbortError and is not Debuggable, since it does not fall into case let debuggable
in Xcode environment.
@MadGeorge thanks for the extra info. I tried again to reproduce this error but I'm still not able to. Here's what I tried:
Databases:
Methods:
pg_terminate_backend(pid)
Errors reported in vapor logs. No errors surfaced to API. New connections were created automatically.
Connections still worked fine after coming back.
Errors reported in logs and surfaced as 500 to API. Once Wifi was re-enabled, new connections were created and all requests began working again.
Errors repoted in logs and surfaced as 500 to API after db was shutdown. Once DB was booted back up, new connections were created and all requests began working again.
Queries ran slower but still no API errors.
req.withPooledConnection(to: .psql)
Nothing changed when using this.
I'm not sure what else to try here. In none of my tests was I able to recreate the Connection reset by peer (errno: 104)
error.
Hi,
just wanted to add here - we are also getting those errors rarely but consistently (from ~50.000 commits a day ~ 10 errors).
We are using PostgreSQL 1.5.0 and Vapor 3.3.2. I access the database via DatabaseConnectionPool
and store/load data via fluent.
Our app is deployed on Heroku (stack 18), build pack (https://github.com/vapor/heroku-buildpack). Postgres (also hosted on heroku, standard-0 plan) version 10.11 (with SSL connection).
@tanner0101 thanks, Tanner.
On prod server i just added cron task which triggers /api/ping
every minute. On those endpoint app perform light select query to db. No errors. But requests from mobile apps still facing errors from time to time.
I give up. I start to thought it is not Vapor/Postgres or Vapor/Fluent problem. It starts to look like Vapor connection problem.
I noticed interesting case - when i move mobile application into background, i can see same Connection reset by peer
message in the logs. I will try to ask about at Vapor repo.
Thank you one more time.
just wanted to add here - we are also getting those errors rarely but consistently (from ~50.000 commits a day ~ 10 errors).
@schwmi to clarify, after you see this error does your app stop working? The main issue I'm concerned about here is this report by @MadGeorge
build and run my project from xcode, leave it for an hour or two [...] then every request to the server routes failed with http status 500 read(descriptor:pointer:size:) failed: Connection reset by peer
To see "connection reset by peer" sometimes is normal. Connections can fail and that is why we use connection pools that are capable of replenishing themselves automatically. However, if failed connections are not being replenished and your app becomes unresponsive, that is definitely a problem.
To further complicate things, the main issue that started this thread is different (we probably should have separated the threads). @mxcl was concerned about surfacing these connection reset messages. Currently if a connection fails (which is normal) during a query, this error will surface as a 500 to the client. Vapor 4 improves its closed connection checking logic so this should be less of a problem. Still, a way to auto-retry failed queries in Fluent could be a good feature to have if you really want to avoid surfacing these errors.
Hopefully this makes sense.
TL;DR:
"connection reset" errors are normal and connection pools should automatically replace closed connections. If this is not happening, that's a problem.
Thanks for the clarification, app doesn't stop working after those messages. So no bug from our side.
Having ported to Vapor 4, I’d say this problem is now in fact worse.
Frequent connection closed log messages followed by that request punting a 500 back to the client.
I'm sure sometimes this is unavoidable, but I've used other backend solutions to Vapor where I never saw this problem.
@mxcl can not bee... is it true? i am working on migration to vapor 4 right now in hope to fix this issue. your news is terrifying me.
@mxcl hmm that's surprising. I would have figured the better conn.isClosed
logic would mean the pool would replace dead connections before trying to use them. Can you share your log messages? Is there any way you could temporarily enable --log trace
and catch a few of these happening? I'd like to make sure the conn pool logic is working as expected before digging into other solutions like auto retry.
I just did a quick sanity check on the vapor template. To make testing this easy, I used an ELG with only one thread.
First request to create a connection and store it in the pool:
[ codes.vapor.application ] [ INFO ] GET /todos [request-id: 938F39CB-D79C-4DD8-AA66-B7E5544A97AD] (Vapor/Responder/DefaultResponder.swift:42)
[ codes.vapor.application ] [ DEBUG ] No available connections on this event loop, creating a new one [request-id: 938F39CB-D79C-4DD8-AA66-B7E5544A97AD, database-id: psql] (AsyncKit/ConnectionPool/EventLoopConnectionPool.swift:202)
Second request to ensure the pooled connection is being used:
[ codes.vapor.application ] [ INFO ] GET /todos [request-id: 3B14EA55-36BC-4D30-8A06-5C04546703DE] (Vapor/Responder/DefaultResponder.swift:42)
[ codes.vapor.application ] [ TRACE ] Re-using available connection [request-id: 3B14EA55-36BC-4D30-8A06-5C04546703DE, database-id: psql] (AsyncKit/ConnectionPool/EventLoopConnectionPool.swift:191)
I then ran this command to kill off all active connections:
SELECT
pg_terminate_backend(pid)
FROM
pg_stat_activity
WHERE
-- don't kill my own connection!
pid <> pg_backend_pid()
-- don't kill the connections to other databases
AND datname = 'vapor_database';
Finally, a third request that will be offered a dead connection:
[ codes.vapor.application ] [ INFO ] GET /todos [request-id: 33B37791-9B76-4080-A75A-DB29187BCA5A] (Vapor/Responder/DefaultResponder.swift:42)
[ codes.vapor.application ] [ DEBUG ] Pruning available connection that has closed [request-id: 33B37791-9B76-4080-A75A-DB29187BCA5A, database-id: psql] (AsyncKit/ConnectionPool/EventLoopConnectionPool.swift:195)
[ codes.vapor.application ] [ DEBUG ] No available connections on this event loop, creating a new one [request-id: 33B37791-9B76-4080-A75A-DB29187BCA5A, database-id: psql] (AsyncKit/ConnectionPool/EventLoopConnectionPool.swift:202)
As the logs show, the dead connection is correctly identified and pruned. No 500 thrown.
I'm facing the same issue. Everything works well on a local machine (my mac book pro) whereas on production server I'm getting the same error as OP quite often.
[ ERROR ] Uncaught error: read(descriptor:pointer:size:): Connection reset by peer (errno: 104) (PostgresNIO/Connection/PostgresConnection+Connect.swift:47)
[ ERROR ] NIOPostgres error: connection closed [request-id: 440FDC63-5D10-46DB-BB72-4CDC0341D3E5] (App/Middleware/ErrorMiddleware.swift:46)
Here is the full log:
[ TRACE ] Decoded HTTP part: head: HTTPRequestHead {SOME POST REQUEST } (Vapor/HTTP/Server/HTTPServerRequestDecoder.swift:34)
[ TRACE ] Decoded HTTP part: body: ByteBuffer { readerIndex: 0, writerIndex: 782, readableBytes: 782, capacity: 782, storageCapacity: 2048, slice: _ByteBufferSlice { 461..<1243 }, storage: 0x00007ff1540bbe20 (2048 bytes) } (Vapor/HTTP/Server/HTTPServerRequestDecoder.swift:34)
[ TRACE ] Decoded HTTP part: end (Vapor/HTTP/Server/HTTPServerRequestDecoder.swift:34)
[ INFO ] POST /api/v1/auth/siwa [request-id: 0AFC64B0-DB47-48B8-B2BB-A2376FFE5437] (Vapor/Responder/DefaultResponder.swift:42)
[ DEBUG ] New data received [request-id: 0AFC64B0-DB47-48B8-B2BB-A2376FFE5437] (Vapor/HTTP/EndpointCache.swift:121)
[ DEBUG ] query read auth_authorization_credential filters=[auth_authorization_credential[apple_user_identifier] = Optional("SOMEID")] limits=[count(1)] [database-id: psql, request-id: 0AFC64B0-DB47-48B8-B2BB-A2376FFE5437] (FluentKit/Query/Builder/QueryBuilder.swift:283)
[ TRACE ] Re-using available connection [database-id: psql, request-id: 0AFC64B0-DB47-48B8-B2BB-A2376FFE5437] (AsyncKit/ConnectionPool/EventLoopConnectionPool.swift:191)
[ DEBUG ] SELECT <<SOME QUERY>> [database-id: psql, request-id: 0AFC64B0-DB47-48B8-B2BB-A2376FFE5437] (PostgresNIO/PostgresDatabase+Query.swift:124)
[ TRACE ] Encoded: PostgresMessage (P) (PostgresNIO/Message/PostgresMessageEncoder.swift:34)
[ TRACE ] Encoded: PostgresMessage (D) (PostgresNIO/Message/PostgresMessageEncoder.swift:34)
[ TRACE ] Encoded: PostgresMessage (B) (PostgresNIO/Message/PostgresMessageEncoder.swift:34)
[ TRACE ] Encoded: PostgresMessage (E) (PostgresNIO/Message/PostgresMessageEncoder.swift:34)
[ TRACE ] Encoded: PostgresMessage (S) (PostgresNIO/Message/PostgresMessageEncoder.swift:34)
[ ERROR ] Uncaught error: read(descriptor:pointer:size:): Connection reset by peer (errno: 104) (PostgresNIO/Connection/PostgresConnection+Connect.swift:47)
[ TRACE ] Encoded: PostgresMessage (X) (PostgresNIO/Message/PostgresMessageEncoder.swift:34)
[ TRACE ] Releasing connection [database-id: psql, request-id: 0AFC64B0-DB47-48B8-B2BB-A2376FFE5437] (AsyncKit/ConnectionPool/EventLoopConnectionPool.swift:262)
[ ERROR ] NIOPostgres error: connection closed [request-id: 0AFC64B0-DB47-48B8-B2BB-A2376FFE5437] (App/Middleware/ErrorMiddleware.swift:46)
[ TRACE ] Unhandled user event: outputClosed (Vapor/HTTP/Server/HTTPServerRequestDecoder.swift:212)
On production server I use Ubuntu 18 and docker. Postgres inside a docker container as well.
Thanks
Hello @MadGeorge, @tanner0101 and others, have you found the solution for this problem? I'm experiencing the same on production Linux server. App and Postgres DB are dockerised in Swarm mode. Idling connections (~ 5 min is enough) are being reset by peer. I have no luck capturing any errors within flatMapError. Trace example:
[ DEBUG ] <<SOME SELECT>> (PostgresNIO/PostgresDatabase+Query.swift:124)
[ TRACE ] Encoded: PostgresMessage (P) (PostgresNIO/Message/PostgresMessageEncoder.swift:34)
[ TRACE ] Encoded: PostgresMessage (D) (PostgresNIO/Message/PostgresMessageEncoder.swift:34)
[ TRACE ] Encoded: PostgresMessage (B) (PostgresNIO/Message/PostgresMessageEncoder.swift:34)
[ TRACE ] Encoded: PostgresMessage (E) (PostgresNIO/Message/PostgresMessageEncoder.swift:34)
[ TRACE ] Encoded: PostgresMessage (S) (PostgresNIO/Message/PostgresMessageEncoder.swift:34)
[ ERROR ] Uncaught error: read(descriptor:pointer:size:): Connection reset by peer (errno: 104) (PostgresNIO/Connection/PostgresConnection+Connect.swift:47)
[ TRACE ] Encoded: PostgresMessage (X) (PostgresNIO/Message/PostgresMessageEncoder.swift:34)
[ TRACE ] Releasing connection [database-id: dbMaster, request-id: CAE06649-6070-453D-ACE7-C6A282573160] (AsyncKit/ConnectionPool/EventLoopConnectionPool.swift:262)
[ ERROR ] NIOPostgres error: connection closed (App/Middleware/EventLimitMiddleware.swift:23)
@HashedViking hi. my solution was... switching to ibm kitura. no reactiveness in the code but no connections errors too. side effect was - i lost money on rewriting entire project. we have deal with open source buy the way, so we all take the risk consciously.
Sorry, it's sad to hear. In my case it's a side project mobile app that I was making more than a year. On the local machine (mac pro) everything is ok, but on the cloud it hits me with that nasty bug. The last thing I can think of before switching my entire backend 🤕 is to dig into postgres-kit or postgres-nio and try to find that bug.
On the local machine (mac pro) everything is ok, but on the cloud it hits me with that nasty bug.
exactly the same, but i was able to face same error couple of time locally when leave project running for a long time and then perform request with http client. but i never able to reproduce it as stable case.
And i also used vapor as mobile backend, JFYI.
cc @gwynne
ok, "6 hours later" I have some ugly solution. I tried to switch the docker image to ubuntu:16.04 (Xenial) - no luck. Then I focused solely on Postgres connections. In my case the problem arises when some connections stay idle for 6+ minutes. For some reason Vapor either don't prune them or prune right after Connection reset by peer
error is thrown and 500 error is returned to a client.
What I did:
psql -U username -d dbname
SELECT pid, state, state_change FROM pg_stat_activity WHERE pid <> pg_backend_pid() AND datname = 'dbname';
\watch 1
on my 2 core Ubuntu Bionic (18.04) server with dockerised Nginx, Vapor 4, Postgres 12 idle connections count is capped at 8
Tue 01 Dec 2020 07:04:01 PM UTC (every 1s)
pid | state | state_change
-----+-------+-------------------------------
156 | idle | 2020-12-01 18:58:34.724857+00
159 | idle | 2020-12-01 18:59:10.786254+00
162 | idle | 2020-12-01 18:59:13.808146+00
163 | idle | 2020-12-01 18:59:13.824784+00
164 | idle | 2020-12-01 18:59:25.101309+00
165 | idle | 2020-12-01 18:59:25.198705+00
116 | idle | 2020-12-01 19:01:13.157704+00
123 | idle | 2020-12-01 19:01:15.860991+00
(8 rows)
SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE pid <> pg_backend_pid() AND datname = 'vapor';
configure.swift
:
app.eventLoopGroup.next().scheduleRepeatedTask(initialDelay: .minutes(4), delay: .minutes(3)) { _ in
app.logger.info("Releasing old connections to master")
_ = app.db(.master).withConnection { conn -> EventLoopFuture<Void> in
return conn.raw("WITH inactive_connections AS ( SELECT pid FROM pg_stat_activity WHERE pid <> pg_backend_pid( ) AND application_name !~ '(?:psql)|(?:pgAdmin.+)' AND datname = current_database() AND usename = current_user AND state in ('idle', 'idle in transaction', 'idle in transaction (aborted)', 'disabled') AND current_timestamp - state_change > interval '3 minutes' ) SELECT pg_terminate_backend(pid) FROM inactive_connections;").run()
}
}
sql request that terminates all connections related to current user and current db that are idle more than for 3 minutes:
WITH inactive_connections AS (
SELECT
pid
FROM
pg_stat_activity
WHERE
-- Exclude the thread owned connection (ie no auto-kill)
pid <> pg_backend_pid( )
AND
-- Exclude known applications connections
application_name !~ '(?:psql)|(?:pgAdmin.+)'
AND
-- Include connections to the same database the thread is connected to
datname = current_database()
AND
-- Include connections using the same thread username connection
usename = current_user
AND
-- Include inactive connections only
state in ('idle', 'idle in transaction', 'idle in transaction (aborted)', 'disabled')
AND
-- Include old connections (found with the state_change field)
current_timestamp - state_change > interval '3 minutes'
)
SELECT
pg_terminate_backend(pid)
FROM
inactive_connections
TL;DR
Add this to configure.swift
:
app.eventLoopGroup.next().scheduleRepeatedTask(initialDelay: .minutes(4), delay: .minutes(3)) { _ in
app.logger.info("Releasing old connections to master")
_ = app.db(.master).withConnection { conn -> EventLoopFuture<Void> in
return conn.raw("WITH inactive_connections AS ( SELECT pid FROM pg_stat_activity WHERE pid <> pg_backend_pid( ) AND application_name !~ '(?:psql)|(?:pgAdmin.+)' AND datname = current_database() AND usename = current_user AND state in ('idle', 'idle in transaction', 'idle in transaction (aborted)', 'disabled') AND current_timestamp - state_change > interval '3 minutes' ) SELECT pg_terminate_backend(pid) FROM inactive_connections;").run()
}
}
@mxcl @tikhop @tanner0101 @MadGeorge @schwmi After some digging I'm sure it's a Docker Swarm issue related to Linux IPVS load balancer which removes idle connections after 15 minutes. Here are some related links:
Haven't tried it yet, but my solution is similar to one of suggestions by Mirantis: Reconfigure or modify affected TCP client applications to close idle TCP sessions prior to the 15 minute timeout.
EDIT: See the below comment from @rnantes for a much better solution specific to the source of the problem.
~Interesting explanation from Mirantis! That does sound like the cause of this issue.~
~At the moment there is no support in Fluent or its drivers for timing out idle database connections; for MySQL and Postgres - and Mongo as well, I believe - this can be configured on the database's side and is better done there. (SQLite connections do not represent network channels and thus do not time out in any event.)~
~Configuring MySQL's connection idle timeout globally:~
# Add to your server's `my.cnf`:
wait_timeout = 840 # 14 minutes
~Configuring MySQL's connection idle timeout on a per-connection basis:~
-- Run this query each time you request a `Database` from Fluent
SET @@session.wait_timeout = 840
~Configuring PostgreSQL's connection idle timeout and keepalive interval globally (the idle timeout will only apply to sessions currently in transactions):~
# Add this to your `postgresql.conf`:
idle_in_transaction_session_timeout = 14m
tcp_keepalives_interval = 14m
~Configuring PostgreSQL's per-connection idle timeout:~
# Add these lines to a `~/.psqlrc`:
idle_in_transaction_session_timeout = 14m
tcp_keepalives_interval =14m
Hi All,
I was running into this using Docker Swarm and I found discussion on the topic here: https://github.com/moby/moby/issues/31208
Apparently the problem occurs because Docker Swarm defaults to closing inactive connections after 15 minutes.
I found a solution to the problem from a linked PR to another project here: https://github.com/ITISFoundation/osparc-simcore/pull/1284
The solution that worked for me was to add a sysctls section to the postgres service section of docker-compose.yml file.
db:
image: postgres:12.4
sysctls:
# NOTES: these values are needed here because docker swarm kills long running idle
# connections by default after 15 minutes see https://github.com/moby/moby/issues/31208
# info about these values are here https://tldp.org/HOWTO/TCP-Keepalive-HOWTO/usingkeepalive.html
- net.ipv4.tcp_keepalive_intvl=600
- net.ipv4.tcp_keepalive_probes=9
- net.ipv4.tcp_keepalive_time=600
Perfect, I confirm that it works. It should be added to the Docker Swarm deploy section in Vapor docs as a warning or a tip.
Does anyone on this thread still have this issue or can we now close it?
I'm good and I can confirm that it was a docker related issue — the same machine, but without docker works like a charm.
Ok I'm going to close this. If anyone hits this or is still experiencing this we can reopen
Just got this in prod. It has been a while since I last saw it, so hopefully that means it is mostly fixed.
Vapor 4.41.4 PostgresFluentDriver 2.1.2 PostgresNIO 1.5.1 PostgresKit 2.3.0
Not using Docker, running raw on Ubuntu 20.04 with Swift 5.3 app compiled in release configuration.
I acknowledge that sometimes this is probably unavoidable, though I dislike thinking it can happen at all and thus would prefer as much mitigation as possible. If this happens enough and I have time I will certainly be debugging it.
Just to add another data point to this issue: we encountered this exact problem and error message when switching from docker-compose to docker swarm and setting the net.ipv4
parameters fixed it.
Sorry if wrong repo.
I recently engaged in some conversation about this issue on Discord. We get a few customers a day that experience this issue (where Postgres has dropped the connection, but Fluent doesn’t know and then the query fails).
I would like to fix it, otherwise I can just see one of our investors using our product and freaking out about the “Something went wrong” error it generates in our React front end.
I believe a solution would be: checking if the connection is still open before using it, and if so I can begin exploring that and submit a PR. LMK if I'm in the wrong layer or heading in the wrong direction. Thanks.