vitessio / vitess

Vitess is a database clustering system for horizontal scaling of MySQL.
http://vitess.io
Apache License 2.0
18.44k stars 2.08k forks source link

RFC: allow `REPLICA` using hot-backup engine to serve traffic + avoid promotion #16558

Open timvaillancourt opened 1 month ago

timvaillancourt commented 1 month ago

Feature Description

As a user of the xtrabackup backup-engine (a non-blocking/"online" backup method), I would like to have the ability to continue to serve query traffic while ensuring the tablet performing the backup is avoided for PRIMARY promotion while the backup occurs

On the topic of avoiding PRIMARY promotion, the xtrabackup engine hardcodes false in the ShouldDrainForBackup() function meaning it can remain a REPLICA at backup-time (instead of "draining" to BACKUP, as I understand it). Continuing to be a REPLICA makes sense because xtrabackup is non-blocking, however under the semi_sync and cross_cell Durability Policies, being a REPLICA causes a tablet to have a neutral Promotion Rule, meaning the backing-up tablet could become a candidate for PRIMARY, which isn't ideal. xtrabackup is cheap but not "free", which is why it's safer to run on a non-PRIMARY

An option to address this is to make the xtrabackup engine's ShouldDrainForBackup() function return true, causing the tablet to become BACKUP during the backup operation, but this will cause a loss of query-serving capacity that could also be avoided, as xtrabackup-ing nodes don't really need to be taken out of service, at least in our experience in production. Adding additional capacity just to allow a backup to occur incurs delays and expenses in our deployment.

This issue is a request for comment for a solution that will allow users of hot/online/non-blocking style backup engines to both:

  1. Prefer tablets performing backup do not become PRIMARY
  2. Continue to serve traffic, if the tablet was serving at backup start-time

One solution is for the xtrabackup backup-engine (and other online-style engines) to trigger a transition to BACKUP (by returning true in ShouldDrainForBackup()) AND also continue to serve replica reads somehow, ie: serving: true during backup-time + I'll assume vtgate ignores the BACKUP type too and may need some tweaks. It may be wise to carry a serving: true state throughout the backup only if the tablet was serving: true at the start. This kind of behaviour change is somewhat breaking so it may need to go behind a flags (dare I say it):

  1. --xtrabackup-drain (to trigger ShouldDrainForBackup() == true unless we want this to be breaking)
  2. --xtrabackup-drain-keep-serving/--xtrabackup-drain-preserve-serving or similar, to indicate the tablet should keep serving traffic

Perhaps this could be passed in the backup request too. cc @rvrangel

Your ideas are appreciated :bow:

Use Case(s)

A user of a "hot"/online/non-blocking Vitess backup engine that:

  1. Would prefer a tablet running a backup to not become PRIMARY
  2. Would like to continue to serve traffic from that tablet (vs provision additional capacity)
deepthi commented 1 month ago

The feature request of avoiding promotion seems reasonable.

The implementation as outlined will be quite complex and have to touch a lot of places, and it will violate some core assumptions, so I don't think it is workable. vtgate strictly routes queries by tablet type. Once you change the tablet type, it is not allowed to route queries that are intended for REPLICA type to BACKUP type. Allowing this is not a good idea, we don't know what side effects it might have. One of the complications at the code level is that query target always includes a tablet type and that is plumbed through various call paths.

Having said that, if you still want to try to PR this, we can see exactly how ugly and complicated it gets with no guarantee that the PR will be accepted.

Have you explored it in the other direction? Temporarily changing the promotion rule? Or is that equally hard-coded and complicated?

I'm also somewhat unclear on exactly where you are seeing serving: true, and what you mean by carrying that throughout the backup. If you can point me to the struct / field / code block, I'll be able to understand that better.

timvaillancourt commented 1 month ago

@deepthi thanks for this useful feedback! I don't think we'd want to create a mess supporting this and based on your gauging of effort I agree that exploring a different direction first would be best πŸ‘

Temporarily changing the promotion rule? Or is that equally hard-coded and complicated?

In a default semi_sync/cross_cell setup today there is no way to change promotion rules, other than changing tablet types that infer a different promotion rule

In https://github.com/vitessio/vitess/issues/16377 (implementation PR https://github.com/vitessio/vitess/pull/16344) I proposed a way to do this with a flag per-tablet, but this approach has the drawback of requiring a tablet restart, which makes a temporary change difficult still. If there was a way to dynamically change the promotion rule of a tablet, that would be awesome, but Durability Polices are passed a single/pair-of topodatapb.Tablet that (to my knowledge) is only written by vttablet. A new vttablet RPC wrapped by a vtctld RPC could allow this to be changed, I suppose. And the backup engine could optionally call this perhaps πŸ€”

EDIT: what might work is adding a topodatapb.PromotionRule type + PromotionRule field to topodata.Tablet as I did in https://github.com/vitessio/vitess/pull/16344, and support backup engine changing that at backup time. Also like #16344, PRS/ERS can respect this as a per-tablet override. It would need to be very sure it set it back to the original value after, however

I'll continue to think and appreciate any ideas people have πŸ™‡

timvaillancourt commented 1 month ago

@deepthi if we focus on promotion rule only, is it safe to say a REPLICA running any backup-engine could benefit from a prefer_not rule vs the default neutral that semi_sync/cross_cell returns now?

If we could automate the setting of prefer_not in the backup engine, this would address the use case πŸ‘. If we know all engines would benefit from prefer_not (when also REPLICA), that logic can be a bit simpler

deepthi commented 1 month ago

We do keep track in tablet manager whether we are currently in the process of taking a backup. @GuptaManan100 is going to look at the code and propose how that can be used to meet this feature request.

timvaillancourt commented 1 month ago

We do keep track in tablet manager whether we are currently in the process of taking a backup.

That's great. The methods of the Durabler interface only get topodata.Tablet today so they can't access that, but maybe this could be checked in reparentutil.PromotionRule or by extending the Durabler interface πŸ€”

@GuptaManan100 is going to look at the code and propose how that can be used to meet this feature request.

@deepthi / @GuptaManan100 thanks!

GuptaManan100 commented 1 month ago

For PRS, we use ReplicationStatus RPC to gather information about all the replicas like their position and lag. For ERS, we use StopReplicationAndGetStatus. In both of those RPCs, we can send a new field called runningBackup which is populated using tm._isBackupRunning. Then we can use this field to remove these tablets from being considered valid for promotion. This change should also be backward compatible and can be done in one single PR, because for old vtctld, the new field in toutput would be ignored, and for old vttablets, we would always default to assuming that backup isn't running since the field would be false.

deepthi commented 1 month ago

if we focus on promotion rule only, is it safe to say a REPLICA running any backup-engine could benefit from a prefer_not rule vs the default neutral that semi_sync/cross_cell returns now?

Technically, at least for now, this doesn't really matter because the builtin backup engine changes the tablet type to BACKUP and those tablets are not eligible for promotion anyway. What Manan is proposing will have this effect for any future online backup methods too.

timvaillancourt commented 1 month ago

@GuptaManan100 great idea, thanks! I'm happy to implement this in the near future

Then we can use this field to remove these tablets from being considered valid for promotion

Do you have a preference where we would add the ignore logic? I think both VTCtld and VTOrc should respect this consistently so perhaps reparentutil.PromotionRule() is a good place (although it probably needs more state passed-in)?

GuptaManan100 commented 1 month ago

I don't think we should put it into PromotionRule because that only uses the tablet record, and I don't think we should store this field there. If we were going to use the promotion rule, it would just be better to change the tablet type to backup. I was thinking more in filterValidCandidates for ERS. This field would remove the tablet from being a valid candidate for promotion. Similarily we remove the candidate from the valid list in PRS in ElectNewPrimary

timvaillancourt commented 1 month ago

I was thinking more in filterValidCandidates for ERS. This field would remove the tablet from being a valid candidate for promotion. Similarily we remove the candidate from the valid list in PRS in ElectNewPrimary

@GuptaManan100 makes sense, thanks. I'll make a draft PR and see what you both think πŸ‘