Open timvaillancourt opened 1 month ago
The feature request of avoiding promotion seems reasonable.
The implementation as outlined will be quite complex and have to touch a lot of places, and it will violate some core assumptions, so I don't think it is workable. vtgate strictly routes queries by tablet type. Once you change the tablet type, it is not allowed to route queries that are intended for REPLICA type to BACKUP type. Allowing this is not a good idea, we don't know what side effects it might have. One of the complications at the code level is that query target always includes a tablet type and that is plumbed through various call paths.
Having said that, if you still want to try to PR this, we can see exactly how ugly and complicated it gets with no guarantee that the PR will be accepted.
Have you explored it in the other direction? Temporarily changing the promotion rule? Or is that equally hard-coded and complicated?
I'm also somewhat unclear on exactly where you are seeing serving: true
, and what you mean by carrying that throughout the backup. If you can point me to the struct / field / code block, I'll be able to understand that better.
@deepthi thanks for this useful feedback! I don't think we'd want to create a mess supporting this and based on your gauging of effort I agree that exploring a different direction first would be best π
Temporarily changing the promotion rule? Or is that equally hard-coded and complicated?
In a default semi_sync
/cross_cell
setup today there is no way to change promotion rules, other than changing tablet types that infer a different promotion rule
In https://github.com/vitessio/vitess/issues/16377 (implementation PR https://github.com/vitessio/vitess/pull/16344) I proposed a way to do this with a flag per-tablet, but this approach has the drawback of requiring a tablet restart, which makes a temporary change difficult still. If there was a way to dynamically change the promotion rule of a tablet, that would be awesome, but Durability Polices are passed a single/pair-of topodatapb.Tablet
that (to my knowledge) is only written by vttablet
. A new vttablet
RPC wrapped by a vtctld
RPC could allow this to be changed, I suppose. And the backup engine could optionally call this perhaps π€
EDIT: what might work is adding a topodatapb.PromotionRule
type + PromotionRule
field to topodata.Tablet
as I did in https://github.com/vitessio/vitess/pull/16344, and support backup engine changing that at backup time. Also like #16344, PRS/ERS can respect this as a per-tablet override. It would need to be very sure it set it back to the original value after, however
I'll continue to think and appreciate any ideas people have π
@deepthi if we focus on promotion rule only, is it safe to say a REPLICA
running any backup-engine could benefit from a prefer_not
rule vs the default neutral
that semi_sync
/cross_cell
returns now?
If we could automate the setting of prefer_not
in the backup engine, this would address the use case π. If we know all engines would benefit from prefer_not
(when also REPLICA
), that logic can be a bit simpler
We do keep track in tablet manager whether we are currently in the process of taking a backup. @GuptaManan100 is going to look at the code and propose how that can be used to meet this feature request.
We do keep track in tablet manager whether we are currently in the process of taking a backup.
That's great. The methods of the Durabler
interface only get topodata.Tablet
today so they can't access that, but maybe this could be checked in reparentutil.PromotionRule
or by extending the Durabler
interface π€
@GuptaManan100 is going to look at the code and propose how that can be used to meet this feature request.
@deepthi / @GuptaManan100 thanks!
For PRS, we use ReplicationStatus
RPC to gather information about all the replicas like their position and lag. For ERS, we use StopReplicationAndGetStatus
. In both of those RPCs, we can send a new field called runningBackup
which is populated using tm._isBackupRunning
. Then we can use this field to remove these tablets from being considered valid for promotion. This change should also be backward compatible and can be done in one single PR, because for old vtctld, the new field in toutput would be ignored, and for old vttablets, we would always default to assuming that backup isn't running since the field would be false.
if we focus on promotion rule only, is it safe to say a REPLICA running any backup-engine could benefit from a prefer_not rule vs the default neutral that semi_sync/cross_cell returns now?
Technically, at least for now, this doesn't really matter because the builtin backup engine changes the tablet type to BACKUP and those tablets are not eligible for promotion anyway. What Manan is proposing will have this effect for any future online backup methods too.
@GuptaManan100 great idea, thanks! I'm happy to implement this in the near future
Then we can use this field to remove these tablets from being considered valid for promotion
Do you have a preference where we would add the ignore logic? I think both VTCtld and VTOrc should respect this consistently so perhaps reparentutil.PromotionRule()
is a good place (although it probably needs more state passed-in)?
I don't think we should put it into PromotionRule
because that only uses the tablet record, and I don't think we should store this field there. If we were going to use the promotion rule, it would just be better to change the tablet type to backup.
I was thinking more in filterValidCandidates
for ERS. This field would remove the tablet from being a valid candidate for promotion. Similarily we remove the candidate from the valid list in PRS in ElectNewPrimary
I was thinking more in filterValidCandidates for ERS. This field would remove the tablet from being a valid candidate for promotion. Similarily we remove the candidate from the valid list in PRS in ElectNewPrimary
@GuptaManan100 makes sense, thanks. I'll make a draft PR and see what you both think π
Feature Description
As a user of the
xtrabackup
backup-engine (a non-blocking/"online" backup method), I would like to have the ability to continue to serve query traffic while ensuring the tablet performing the backup is avoided forPRIMARY
promotion while the backup occursOn the topic of avoiding
PRIMARY
promotion, thextrabackup
engine hardcodesfalse
in theShouldDrainForBackup()
function meaning it can remain aREPLICA
at backup-time (instead of "draining" toBACKUP
, as I understand it). Continuing to be aREPLICA
makes sense becausextrabackup
is non-blocking, however under thesemi_sync
andcross_cell
Durability Policies, being aREPLICA
causes a tablet to have aneutral
Promotion Rule, meaning the backing-up tablet could become a candidate forPRIMARY
, which isn't ideal.xtrabackup
is cheap but not "free", which is why it's safer to run on a non-PRIMARY
An option to address this is to make the
xtrabackup
engine'sShouldDrainForBackup()
function returntrue
, causing the tablet to becomeBACKUP
during the backup operation, but this will cause a loss of query-serving capacity that could also be avoided, asxtrabackup
-ing nodes don't really need to be taken out of service, at least in our experience in production. Adding additional capacity just to allow a backup to occur incurs delays and expenses in our deployment.This issue is a request for comment for a solution that will allow users of hot/online/non-blocking style backup engines to both:
PRIMARY
One solution is for the
xtrabackup
backup-engine (and other online-style engines) to trigger a transition toBACKUP
(by returningtrue
inShouldDrainForBackup()
) AND also continue to serve replica reads somehow, ie:serving: true
during backup-time + I'll assumevtgate
ignores theBACKUP
type too and may need some tweaks. It may be wise to carry aserving: true
state throughout the backup only if the tablet wasserving: true
at the start. This kind of behaviour change is somewhat breaking so it may need to go behind a flags (dare I say it):--xtrabackup-drain
(to triggerShouldDrainForBackup()
==true
unless we want this to be breaking)--xtrabackup-drain-keep-serving
/--xtrabackup-drain-preserve-serving
or similar, to indicate the tablet should keep serving trafficPerhaps this could be passed in the backup request too. cc @rvrangel
Your ideas are appreciated :bow:
Use Case(s)
A user of a "hot"/online/non-blocking Vitess backup engine that:
PRIMARY