Open knizhnik opened 11 months ago
Just want to notice that the original problem is not solved. @ars fix stupid behaviour causing constant reactivation of suspended publisher by prohibiting its suspension! But the original concern (compute is not scaled down to zero if logical replication subscription exists) was not addressed.
As I mentioned in Slack, there are two possible solutions of the problem:
Both approaches have to be implemented at proxy.
There are two issues that sound similarly
Which should we keep?
This one
@neondatabase/proxy WDYT?
It's the first I've heard of this issue. Will need to research the problem/proposals.
I think doing 1 is fine. I suspect it's time to start sniffing the packets as we send them. As I understand it, all we need to detect is a simple query with the text START_REPLICATION SLOT _ LOGICAL
, at that point, proxy hijacks the connection and does the magic.
While there is a publisher active, proxy initiates it's own logical replication connection to the publisher and passes on all bytes to the subscriber. When the publisher detaches, proxy holds the connection open with the subscriber and asks CPlane for a notification when the publisher is back online. At that point, proxy recreates the logrep connection and continues passing data to the subscriber.
Regarding port exhaustion, this is already a concern for all proxy connections 😅 At some point we need to implement idle connection GC, proxy scaling and NLB availability checks so we can actually do something about that.
- Prohibit replication connections if subscriber is suspended
I don't think this is possible without some trickery or latency cost
proxy does not yet know exactly what connection parameters a database will respond with, so proxy must wake_compute before hand. We would need some kind of way for cplane to know the params ahead of time, and for cplane to send them to proxy without adding additional latency.
I think maybe it's better to have a dedicated endpoint/proxy instance for replication so we can ignore startup latency for those connections. This would require disallowing logrep on regular proxy (which is a breaking change)
I think maybe it's better to have a dedicated endpoint/proxy instance for replication so we can ignore startup latency for those connections. This would require disallowing logrep on regular proxy (which is a breaking change)
Hmmm... I do not understand your proposal. The problem is that endpoint is not suspended. And you are suggesting to launch one more endpoint. Introducing yet another specialised proxy just for logical replication is possible... but IMHO it is awful.
I do not see much troubles in complicating proxy code. Actually IMHO it is not good that we have now two proxies: our proxy and pgbouncer. I think that finally we should implement connection pooling inside proxy.
Another "must have" features is permanent connections - suspending of compute should not break connection with client: it should be transparent to client. And it should be also done at proxy level.
Please notice that all this things should be handled only at connect time - there is no any overhead during normal work (forwarding messages from client to compute and backward).
I do not see much troubles in complicating proxy code. Actually IMHO it is not good that we have now two proxies: our proxy and pgbouncer. I think that finally we should implement connection pooling inside proxy.
I'm with you on that one. My suggestion for a possible second proxy would not be in front of the current proxy, but separate entirely. So no double proxying.
Please notice that all this things should be handled only at connect time
This is true. I guess connection latencies shouldn't matter much except for HTTP/WS workflows. Regular TCP flows I think we would be ok with adding a little overhead on startup.
There is no need to look for some particular query i.e. START_REPLICATION SLOT _ LOGICAL
There should be replication
parameter in connection string.
I did see "replication": "database"
in the startup parameters, but I can't find it documented as being required. Do you have a doc that confirms it must be given by a replication subscriber?
Ah, found it: https://www.postgresql.org/docs/16/protocol-replication.html#PROTOCOL-REPLICATION
Passing database as the value for the replication parameter instructs the backend to go into logical replication walsender mode, connecting to the database specified in the dbname parameter. In logical replication walsender mode, the replication commands shown below as well as normal SQL commands can be issued.
One idea on how to implement this:
When the compute shuts down, we keep the subcriber client-> proxy connection open. When the compute wakes up again (because of some other connection), we kill the client -> proxy connection. The subscriber should then reconnect.
One idea on how to implement this:
When the compute shuts down, we keep the subcriber client-> proxy connection open. When the compute wakes up again (because of some other connection), we kill the client -> proxy connection. The subscriber should then reconnect.
it can really be done in this way. And it seems to be the simplest way.
One of the alternative is just to prohibit "replication" connections to wakeup compute. Replication connection can be easily recognised by database URL. The potential problem with this approach is that subscriber can immediately try to reconnect. To prevent overloading of network and server with such unsuccessful connection attempts we can send error with some delay. But if subscriber has some limit for maximal number of reconnects after which it give up, then user's will not like such behaviour.
The idea solution IMHO is to provide transparent connection: if compute reconnects, then proxy should just re-establish connection with compute and continue forwarding traffic between client and compute as if nothing happen.
Discussion result from compute meeting: not being able to scale down to zero should not prevent us to release logical replication.
Steps to reproduce
We are now suspending computes with active connections if there were no requests during some time. So if we have idle master with logical replication subscription, then it will be really suspended after some time. But then (it depends on particular susbscriber), most likely this subscriber detect connection failure and tries to reestablish it. At least it will have with standard postgres-to-postgres logical subscription. As a result master will be immediately reactivated. So we nevr scale to zero and at the same time do stupid job of restarting compute. The only possible solution I see at this moment is to keep open connection at replica and establish connection with compute on demand. It can be useful not only for logical replication. It will help to permanent connections with Neon, even when compute is suspended. Certainly there should be large number of problems which have to be solved:L How to prevent overflow of available ports space How to store authentication information needed to reestablish connection with compute In case of logical replication connection has to be reactivated from another end (from compute to proxy)
Expected result
Compute is suspended where there is no activity for some time.
Actual result
Compute is never scale to zero. Hypothesis: is is reactive by logical replication subscriber when it detects connection failure with the suspended compute