vertica / vertica-kubernetes

Operator, container and Helm chart to deploy Vertica in Kubernetes
Apache License 2.0
44 stars 25 forks source link

Add a knob to control session transfer #941

Closed roypaulin closed 3 weeks ago

roypaulin commented 3 weeks ago

This adds vertica.com/online-upgrade-ignore-session-transfer which when true ignore the new session transfer drain logic.

fenic-fawkes commented 3 weeks ago

why not just disable online upgrade entirely? without session transfer, online upgrade is just worse than read only upgrade, right? instead of at least having read only access to the database for a while clients are locked out entirely, or worse they're accidentally let in and write data that may or may not be replicated. if we are going to go ahead with this way, we also need to modify the service routing config earlier, during "pause" before we kill all connections.

cchen-vertica commented 3 weeks ago

24.3.0-4 and 24.4.0-1 doesn't have a schedule yet so people cannot try online upgrade after we release the new operator for a long time. I assume no customers used the online upgrade yet since our online-upgrade documentation hasn't published yet. I want the customers to try online upgrade and give us some feedback. I think we need this knob, and I agree we need to block all user connections before database replication (our old behavior) when this knob is on. This knob shouldn't be set by the user. It should be inferred from vertica version. If the version is >= 24.3.0-4 and 24.4.0-1, we turn off the knob; if the version is 24.3.0-2, 24.3.0-3 or 24.4.0-0, we turn on the knob.

fenic-fawkes commented 3 weeks ago

since we're going to have to document the limitations anyway, why not just document that pause timeout doesn't work but still do the rest of session transfer? the server supports session transfer, the issue is just that upgrade can get paused forever if there are sessions not sending queries. we could document this limitation, saying that manual intervention might be needed. that reads better to me than the current solution which is documenting that using "online" upgrade will in fact kill all sessions connected to the database and prevent new connections until the sandbox is ready. if we really want to avoid any manual intervention from customers at all costs, we could have the pause timeout just indiscriminately kill all sessions. it's worse than the solution for 24.3-4 and 24.4-1, but is better than this, imo. but, again, it's a limitation that would have to be documented. and since this is a beta feature (or should be, idk how we doc that stuff) customers using it should read the docs

cchen-vertica commented 3 weeks ago

Closed this one since the work is done in #945.