Closed belvedere-trading-user closed 8 years ago
@belvedere-trading Thanks for this excellent report. I know we had a significant fix regarding multi-master that came through in either 2014.7.1 or 2014.7.2 dealing with some race conditions. I can't say for sure or not that it will fix this issue, but it might. It might be worthwhile to investigate this on 2014.7.2 to see if this is cleared up. Otherwise, this should be easy to reproduce using the steps you've given.
I'll try to recreate it with 2014.7.2 sometime today. If there are other questions let me know (or I'm 'Deevolution' in IRC).
Well... 2014.7.2 shows the same issue even when both masters are up. If needed I can post logs, but they look VERY similar. The only different thing is if I run the same ping command on the other (up) master while the first one is hanging then they both immediately return results.
@belvedere-trading It looks like I misspoke before. My apologies! I thought this fix had been backported to the 2014.7 branch, but it looks like it was not. If you're willing to try again, see if this is fixed at the HEAD of the 2015.2 branch. The fix that I am thinking of addresses a race condition we were seeing with the event handling in multi-master, which was backported to 2015.2 in #20964.
I'll let you know the results once I can do that. Thanks.
This has been fixed in 2015.8.3, and possibly before then as well. There were many fixes that went into the standard multi-master setup for the 2015.8.3 release, and this is definitely resolved after retesting just now.
Since this is fixed in 2015.8.3, I'll go ahead and close this. If you bump into this issue again, please leave a comment and we'll be glad to take another look.
I've been testing multi-master with a simple Vagrant setup. I'm seeing extra time to execute commands on minions when one of the masters is down (Logs detailing this below). With both masters up, it takes less then 3 seconds for either master to do a test.ping against all minions. With one master down it takes 56.5 seconds (I've seen this go as high as 75 seconds). Details below:
Two minions:
2 masters:
Minion configs (note: the id: field is set correctly to each minion's name):
Master configs:
Good Case; both masters up
saltmaster1 logs (good case):
saltmaster2 logs (good case)
saltminion1 logs (good case)
saltminion2 logs (good case)
Bad Case; one master down
saltmaster1 logs (bad case)
saltmaster2 no logs (bad case) saltminion1 logs (bad case)
saltminion2 logs (bad case)