Closed hellracer closed 5 years ago
Just to make clear the GTID errant transaction range is constantly changing, the only way for me to fixed this is to completely stop all DB traffic in the application and click the "FIX button".
Can this be improved? or if this is the intended behaviour there should be a way to inform the user that the fix button can only work if there's no traffic in the whole system, just to removed confusion
anyway it's just my 0.2$.
Just to make clear the GTID errant transaction range is constantly changing
That's not an expected behavior. This would only happen if the old-master is still taking writes. According to your screenshot it's read_only
. But is it also super_read_only
? Perhaps there's a user with SUPER
privileges still writing? A pt-heartbeat
perhaps? An archiving job?
Your current situation is that the old-master is invalid. You should investigate what those errant GTIDs are. You can use orchestrator-client -c which-gtid-errant -i <old.master>
and orchestrator-client -c locate-gtid-errant <old.master>
.
@shlomi-noach
Indeed pt-heartbeat is the culprit when previous master was demoted, pt-heartbeat still point to the old master and write on percona.heartbeat table this wouldn't happen if the user run pt-heartbeat doesn't have the SUPER privilege as you have predicted.
That's explain it :)
Again please close this as this is not an orchestrator issue but rather a user error :)
Hi,
After a successful graceful failover everything is working as expected but the newly demoted master that become slave sure enough can keep up with master, but the errant GTID is hard to fix because it's moving fast specially in a busy system, that render the FIX "Button" in the gui is proved to be difficult to use or can be useless in this situation.
This issue supersede #885 i decided not to re-open it because i haven't clearly understood the issue now it's become clearer here.