svlsResearch / ha-mikrotik

High availability code for Mikrotik routers
155 stars 44 forks source link

HA has been working properly for 6 months and suddenly stopped working #20

Closed nuriatehas closed 4 years ago

nuriatehas commented 4 years ago

log_HA.txt

Hi,

I don't know how debug this problem. I'm running HA with RB4011 and firmware version 6.45.6 We did HA tests at first and it worked correctly. Suddenly it seems that the Mikrotik B has begun to function as master. Mikrotik had Internet access, users did not have Internet.

I attach the Mikrotik log to see if we can see what has happened.

Thanks.

nathanfaber commented 4 years ago

Which router is this log from? It appears to be A. Is that right or is this actually B? Can you provide any additional detail as to the events and what happened?

nuriatehas commented 4 years ago

Log is from Mikrotik A when it changes to standby.

It seems that A stops working and B becomes master. Restarting B the A works correctly again. I don't know why the A stops working since it is connected to a UPS line. We realize when users complain that they don't have the internet.

nathanfaber commented 4 years ago

may/27 12:05:04 script,info ha_startup: ha_report_startup debug version=6.45.6 (stable) firmware=6.45.4 badC=0 goodC=1 delay1C=10 delay2C=0 uptime=00:01:12 isMaster=false haPreferMac= haInitTries=1 haStartupHasRun=00:00:07 haStartupHAVersion=0.7test15 - de3cd22b6c4882782ab0c7bf2903cd

Are you saying A became standby while it was running, without a reboot of some sort? According to this log, this router was up for 1m12s.

I still don’t quite understand the sequence of events so I’m not sure what happened here.

nuriatehas commented 4 years ago

12:05 is when I rebooted B and A became Master.

nathanfaber commented 4 years ago

This log seems to show A booting up and then transitioning to MASTER at 12:06:13. There aren't really any interesting logs here in terms of what happened before.

If I understand what you are saying...it sounds like B is not working correctly when it becomes primary but A functions fine? Have you tested failovers recently? Are you sure there isn't some change that has prevented B from working vs. when you tested it 6 months ago?

nathanfaber commented 4 years ago

Reopen if you are able to reproduce.