3-node orchestrator in raft mode question

hellracer commented 5 years ago

In a three way node orchestrator in raft mode, it's been documented that you can loose one of the node and you are still ok whilst on the 5 node setup you can loose at least two node and the setup will still going to be okay.

In a three node setup if you loose any one of the two follower node the cluster still accessible on the GUI on any of the surviving node.

If you loose the leader there's an election process that will going to take place and one of the two surviving node will be elected as the new raft leader, now the question is why the cluster is unavailable in the GUI after one of the node has been elected as new leader?

Is this is the right behaviour?

shlomi-noach commented 5 years ago

on the 5 node setup you can loose at least two node

at most two

now the question is why the cluster is unavailable in the GUI after one of the node has been elected as new leader?

how do you access the GUI? What's the address you point to? If you type the address of the leader, and the leader is dead, then it makes sense. Or are you using a proxy (you should)?

hellracer commented 5 years ago

I access the IP address of the new raft leader and the cluster is not available there I haven't deployed orc with reverse proxy yet and will surely do. Anyway please don't close this yet I will recommence testing this week will update this if I can really reproduce the issue, thanks @shlomi

shlomi-noach commented 5 years ago

The GUI should be available on the leader node. Can you paste the output of http://<your-leader-node:3000>/api/status?

hellracer commented 5 years ago

As per instruction please see status below, this is the status of the new raft leader after i killed the previous master

{
    "Code": "OK",
    "Message": "Application node is healthy",
    "Details": {
        "Healthy": true,
        "Hostname": "xxxx",
        "Token": "91a16dc1ff2392ec9a741b9e68bffe1b5888d20bc7aa9e5d12a03e199a0baca1",
        "IsActiveNode": true,
        "ActiveNode": {
            "Hostname": "192.168.151.228:10008",
            "Token": "",
            "AppVersion": "",
            "FirstSeenActive": "",
            "LastSeenActive": "",
            "ExtraInfo": "",
            "Command": "",
            "DBBackend": "",
            "LastReported": "0001-01-01T00:00:00Z"
        },
        "Error": null,
        "AvailableNodes": [{
            "Hostname": "xxxx",
            "Token": "91a16dc1ff2392ec9a741b9e68bffe1b5888d20bc7aa9e5d12a03e199a0baca1",
            "AppVersion": "3.0.14",
            "FirstSeenActive": "2019-05-20T07:45:12Z",
            "LastSeenActive": "2019-05-20T07:49:02Z",
            "ExtraInfo": "",
            "Command": "",
            "DBBackend": "/var/lib/orchestrator/data/orchestrator.sqlite3",
            "LastReported": "0001-01-01T00:00:00Z"
        }],
        "RaftLeader": "192.168.151.228:10008",
        "IsRaftLeader": true,
        "RaftLeaderURI": "http://192.168.151.228:8000",
        "RaftAdvertise": "192.168.151.228",
        "RaftHealthyMembers": ["192.168.131.22", "192.168.151.228"]
    }
}

hellracer commented 5 years ago

Why the "DBBackend": "" on ActiveNode is null whilst the AvailableNodes is having a correct value for DBBackend attribute

I can attest that all of them using sqlite3 backend and having the same configuration. Do i need to share the sqlite3 flat db file across 3 orc nodes via NFS? I don't think that would be the case isn't it?

hellracer commented 5 years ago

sample config

{
  "Debug": true,
  "EnableSyslog": false,
  "ListenAddress": ":8000",
  "RaftEnabled": true,
  "RaftDataDir": "/var/lib/orchestrator",
  "RaftBind": "192.168.151.228",
  "RaftNodes": ["192.168.141.50", "192.168.131.22", "192.168.151.228"] ,
  "BackendDB": "sqlite",
  "SQLite3DataFile": "/var/lib/orchestrator/data/orchestrator.sqlite3",
  "MySQLTopologyUser": "orchestrator",
  "MySQLTopologyPassword": "supersikret",
  "MySQLTopologyCredentialsConfigFile": "",
  "MySQLTopologySSLPrivateKeyFile": "",
  "MySQLTopologySSLCertFile": "",
  "MySQLTopologySSLCAFile": "",
  "MySQLTopologySSLSkipVerify": true,
  "MySQLTopologyUseMutualTLS": false,
  "MySQLOrchestratorCredentialsConfigFile": "",
  "MySQLOrchestratorSSLPrivateKeyFile": "",
  "MySQLOrchestratorSSLCertFile": "",
  "MySQLOrchestratorSSLCAFile": "",
  "MySQLOrchestratorSSLSkipVerify": true,
  "MySQLOrchestratorUseMutualTLS": false,
  "MySQLConnectTimeoutSeconds": 1,
  "DefaultInstancePort": 3306,
  "DiscoverByShowSlaveHosts": false,
  "InstancePollSeconds": 5,
  "ReadLongRunningQueries": true,
  "UnseenInstanceForgetHours": 240,
  "SnapshotTopologiesIntervalHours": 0,
  "InstanceBulkOperationsWaitTimeoutSeconds": 10,
  "HostnameResolveMethod": "default",
  "MySQLHostnameResolveMethod": "@@hostname",
  "SkipBinlogServerUnresolveCheck": true,
  "ExpiryHostnameResolvesMinutes": 60,
  "RejectHostnameResolvePattern": "",
  "ReasonableReplicationLagSeconds": 10,
  "ProblemIgnoreHostnameFilters": [],
  "VerifyReplicationFilters": false,
  "ReasonableMaintenanceReplicationLagSeconds": 20,
  "CandidateInstanceExpireMinutes": 86400,
  "ReplicationCredentialsQuery": "SELECT User_name, User_password from mysql.slave_master_info",
  "AuditLogFile": "",
  "AuditToSyslog": false,
  "RemoveTextFromHostnameDisplay": ".mydomain.com:3306",
  "ReadOnly": false,
  "AuthenticationMethod": "basic",
  "HTTPAuthUser": "admin",
  "HTTPAuthPassword": "supersikret",
  "AuthUserHeader": "",
  "PowerAuthUsers": [
    "*"
  ],
  "ClusterNameToAlias": {
    "127.0.0.1": "test suite"
  },
  "SlaveLagQuery": "",
  "DetectClusterAliasQuery": "select ifnull(max(cluster_name), '') as cluster_alias from meta.cluster where anchor=1",
  "DetectClusterDomainQuery": "select ifnull(max(cluster_domain), '') as cluster_domain from meta.cluster where anchor=1",
  "DetectInstanceAliasQuery": "",
  "DetectPromotionRuleQuery": "",
  "DetectDataCenterQuery": "select datacenter_name from meta.dc where anchor=1",
  "DataCenterPattern": "",
  "PhysicalEnvironmentPattern": "",
  "PromotionIgnoreHostnameFilters": ["DB04"],
  "DetectSemiSyncEnforcedQuery": "SELECT @@global.rpl_semi_sync_master_wait_no_slave AND @@global.rpl_semi_sync_master_timeout >= 30000",
  "ServeAgentsHttp": false,
  "AgentsServerPort": ":3001",
  "AgentsUseSSL": false,
  "AgentsUseMutualTLS": false,
  "AgentSSLSkipVerify": false,
  "AgentSSLPrivateKeyFile": "",
  "AgentSSLCertFile": "",
  "AgentSSLCAFile": "",
  "AgentSSLValidOUs": [],
  "UseSSL": false,
  "UseMutualTLS": false,
  "SSLSkipVerify": false,
  "SSLPrivateKeyFile": "",
  "SSLCertFile": "",
  "SSLCAFile": "",
  "SSLValidOUs": [],
  "URLPrefix": "",
  "StatusEndpoint": "/api/status",
  "StatusSimpleHealth": true,
  "SkipMaxScaleCheck": true,
  "StatusOUVerify": false,
  "AgentPollMinutes": 60,
  "UnseenAgentForgetHours": 6,
  "StaleSeedFailMinutes": 60,
  "SeedAcceptableBytesDiff": 8192,
  "PseudoGTIDPattern": "",
  "PseudoGTIDPatternIsFixedSubstring": false,
  "PseudoGTIDMonotonicHint": "asc:",
  "DetectPseudoGTIDQuery": "",
  "BinlogEventsChunkSize": 10000,
  "SkipBinlogEventsContaining": [],
  "ReduceReplicationAnalysisCount": true,
  "FailureDetectionPeriodBlockMinutes": 60,
  "RecoveryPollSeconds": 10,
  "RecoveryPeriodBlockMinutes": 1,
  "RecoveryPeriodBlockSeconds": 60,
  "RecoveryIgnoreHostnameFilters": [],
  "RecoverMasterClusterFilters": [
    "*"
  ],
  "RecoverIntermediateMasterClusterFilters": [
    "*"
  ],
  "OnFailureDetectionProcesses": [],
  "PreGracefulTakeoverProcesses": [
    "/opt/orchestrator/scripts/prefailover"
  ],
  "PreFailoverProcesses": [],
  "PostFailoverProcesses": [],
  "PostUnsuccessfulFailoverProcesses": [],
  "PostMasterFailoverProcesses": [],
  "PostIntermediateMasterFailoverProcesses": [],
  "PostGracefulTakeoverProcesses": [
    "/opt/orchestrator/scripts/postfailover"
  ],
  "CoMasterRecoveryMustPromoteOtherCoMaster": true,
  "DetachLostSlavesAfterMasterFailover": true,
  "ApplyMySQLPromotionAfterMasterFailover": true,
  "PreventCrossDataCenterMasterFailover": false,
  "MasterFailoverDetachSlaveMasterHost": false,
  "MasterFailoverLostInstancesDowntimeMinutes": 0,
  "PostponeSlaveRecoveryOnLagMinutes": 0,
  "OSCIgnoreHostnameFilters": [],
  "GraphiteAddr": "",
  "GraphitePath": "",
  "GraphiteConvertHostnameDotsToUnderscores": true,
  "ConsulAddress": "",
  "ConsulAclToken": ""
}

shlomi-noach commented 5 years ago

Do i need to share the sqlite3 flat db file across 3 orc nodes via NFS?

No, each uses its own SQLite DB

Why the "DBBackend": "" on ActiveNode is null whilst the AvailableNodes is having a correct value for DBBackend attribute

Let's take that someplace else; it is "fine" for our purposes.

OK, so the status seems legit.

And you say the new leader knows nothing about the cluster? Is there nothing under the "clusters" menu?

hellracer commented 5 years ago

And you say the new leader knows nothing about the cluster? Is there nothing under the "clusters" menu?

Yes that's correct!

shlomi-noach commented 5 years ago

That is... strange indeed. Was the cluster available before the failover? Let's skip failovers for now, I think we're mixing two unrelated things here.

Let's start fresh.

There's a leader, and it finds out about some cluster. So far so good?
What happens if you open the web interface on one of the followers?
What happens if 60 seconds later (I'm just giving some time to breathe here) you open the web interface on one of the followers? Do you see the cluster?
Assuming all went well, turn off the leader service; let another node grab leadership. What do you see on its web interface? Do you see the cluster?

hellracer commented 5 years ago

That is... strange indeed. Was the cluster available before the failover? Let's skip failovers for now, I think we're mixing two unrelated things here.

Yes the cluster is available before the failover.

There's a leader, and it finds out about some cluster. So far so good?

Yes that's correct before the failover, if re-election occurs and one of the follower has been nominated and elected as new leader then the cluster is not found on the GUI of the new leader same for the follower GUI

What happens if you open the web interface on one of the followers?

You will see the cluster in any of the follower before a failover occurs no problem there, the problem occurs after a failover

What happens if 60 seconds later (I'm just giving some time to breathe here) you open the web interface on one of the followers? Do you see the cluster?

Cluster is not found no matter how many minutes I wait after the failover occurs, though a new master has been nominated and elected

Assuming all went well, turn off the leader service; let another node grab leadership. What do you see on its web interface? Do you see the cluster?

When a follower node has been elected as new raft master cluster is not found

Screen Shot 2019-05-20 at 8 56 52 PM Screen Shot 2019-05-20 at 8 55 50 PM

shlomi-noach commented 5 years ago

I'm... dumbfounded. I'm sorry, I don't have an insight right now. Never seen this. Can you see anything in the logs? Please run with --debug and look for errors.

hellracer commented 5 years ago

I'm... dumbfounded. I'm sorry, I don't have an insight right now. Never seen this. Can you see anything in the logs? Please run with --debug and look for errors.

That's fine and thanks for your time, i know this is very strange indeed!

Assuming when I kill all of the three orc daemon and start everything from scratch "before any failover" it's working fine and the cluster is there and it match with the API status

Screen Shot 2019-05-20 at 9 29 53 PM

Screen Shot 2019-05-20 at 9 29 59 PM

hellracer commented 5 years ago

In debug mode there's no obvious reason why the cluster is not found after failover occurs.

2019-05-21 06:35:08 DEBUG raft leader is 192.168.151.228:10008 (this host); state: Leader
2019/05/21 06:35:08 [DEBUG] raft: Failed to contact 192.168.141.50:10008 in 2m28.269493685s
2019-05-21 06:35:08 DEBUG orchestrator/raft: applying command 47468: request-health-report
[martini] Started GET /api/raft-follower-health-report/3c385f8f/192.168.151.228/192.168.151.228 for 192.168.151.228:35050
[martini] Completed 200 OK in 930.597µs
[martini] Started GET /api/raft-follower-health-report/3c385f8f/192.168.131.22/192.168.131.22 for 192.168.131.22:36188
[martini] Completed 200 OK in 896.624µs
2019/05/21 06:35:08 [DEBUG] raft: Failed to contact 192.168.141.50:10008 in 2m28.743079262s
2019/05/21 06:35:09 [DEBUG] raft: Failed to contact 192.168.141.50:10008 in 2m29.176000887s
2019/05/21 06:35:09 [DEBUG] raft: Failed to contact 192.168.141.50:10008 in 2m29.643791753s
2019/05/21 06:35:10 [DEBUG] raft: Failed to contact 192.168.141.50:10008 in 2m30.138442101s
2019/05/21 06:35:10 [DEBUG] raft: Failed to contact 192.168.141.50:10008 in 2m30.583563484s
2019/05/21 06:35:11 [DEBUG] raft: Failed to contact 192.168.141.50:10008 in 2m31.081037222s
2019/05/21 06:35:11 [DEBUG] raft: Failed to contact 192.168.141.50:10008 in 2m31.544444427s
2019/05/21 06:35:12 [DEBUG] raft: Failed to contact 192.168.141.50:10008 in 2m32.001210637s
2019/05/21 06:35:12 [DEBUG] raft: Failed to contact 192.168.141.50:10008 in 2m32.498258384s
2019/05/21 06:35:12 [DEBUG] raft: Failed to contact 192.168.141.50:10008 in 2m32.960400433s
2019-05-21 06:35:13 DEBUG raft leader is 192.168.151.228:10008 (this host); state: Leader
2019/05/21 06:35:13 [DEBUG] raft: Failed to contact 192.168.141.50:10008 in 2m33.422171734s
2019/05/21 06:35:13 [DEBUG] raft: Failed to contact 192.168.141.50:10008 in 2m33.83278247s
2019/05/21 06:35:14 [DEBUG] raft: Failed to contact 192.168.141.50:10008 in 2m34.274538092s
2019/05/21 06:35:14 [ERR] raft: Failed to AppendEntries to 192.168.141.50:10008: dial tcp 192.168.141.50:10008: connect: connection refused
2019/05/21 06:35:14 [DEBUG] raft: Failed to contact 192.168.141.50:10008 in 2m34.745457205s
2019/05/21 06:35:15 [DEBUG] raft: Failed to contact 192.168.141.50:10008 in 2m35.177929576s
2019/05/21 06:35:15 [DEBUG] raft: Failed to contact 192.168.141.50:10008 in 2m35.644115985s
2019/05/21 06:35:16 [DEBUG] raft: Failed to contact 192.168.141.50:10008 in 2m36.087341796s
2019/05/21 06:35:16 [DEBUG] raft: Failed to contact 192.168.141.50:10008 in 2m36.54869029s
2019/05/21 06:35:17 [DEBUG] raft: Failed to contact 192.168.141.50:10008 in 2m36.989197815s
2019/05/21 06:35:17 [ERR] raft: Failed to heartbeat to 192.168.141.50:10008: dial tcp 192.168.141.50:10008: connect: connection refused
2019/05/21 06:35:17 [DEBUG] raft: Failed to contact 192.168.141.50:10008 in 2m37.447251021s
2019/05/21 06:35:17 [DEBUG] raft: Failed to contact 192.168.141.50:10008 in 2m37.925381626s
2019-05-21 06:35:18 DEBUG raft leader is 192.168.151.228:10008 (this host); state: Leader
2019-05-21 06:35:18 DEBUG orchestrator/raft: applying command 47469: request-health-report
[martini] Started GET /api/raft-follower-health-report/3201b2a2/192.168.151.228/192.168.151.228 for 192.168.151.228:35050
[martini] Completed 200 OK in 724.482µs
[martini] Started GET /api/raft-follower-health-report/3201b2a2/192.168.131.22/192.168.131.22 for 192.168.131.22:36188
[martini] Completed 200 OK in 872.63µs
2019/05/21 06:35:18 [DEBUG] raft: Failed to contact 192.168.141.50:10008 in 2m38.373006653s
2019/05/21 06:35:18 [DEBUG] raft: Failed to contact 192.168.141.50:10008 in 2m38.850190972s
2019/05/21 06:35:19 [DEBUG] raft: Failed to contact 192.168.141.50:10008 in 2m39.298116856s
2019/05/21 06:35:19 [DEBUG] raft: Failed to contact 192.168.141.50:10008 in 2m39.75620411s
2019/05/21 06:35:20 [DEBUG] raft: Failed to contact 192.168.141.50:10008 in 2m40.212382274s
2019/05/21 06:35:20 [DEBUG] raft: Failed to contact 192.168.141.50:10008 in 2m40.677159265s
2019/05/21 06:35:21 [DEBUG] raft: Failed to contact 192.168.141.50:10008 in 2m41.127393386s
2019/05/21 06:35:21 [DEBUG] raft: Failed to contact 192.168.141.50:10008 in 2m41.624493508s
2019/05/21 06:35:22 [DEBUG] raft: Failed to contact 192.168.141.50:10008 in 2m42.113499953s
2019/05/21 06:35:22 [DEBUG] raft: Failed to contact 192.168.141.50:10008 in 2m42.569192507s
2019/05/21 06:35:23 [DEBUG] raft: Failed to contact 192.168.141.50:10008 in 2m43.030674484s
2019-05-21 06:35:23 DEBUG raft leader is 192.168.151.228:10008 (this host); state: Leader
2019/05/21 06:35:23 [DEBUG] raft: Failed to contact 192.168.141.50:10008 in 2m43.473255932s
2019/05/21 06:35:23 [DEBUG] raft: Failed to contact 192.168.141.50:10008 in 2m43.912206887s
2019/05/21 06:35:24 [DEBUG] raft: Failed to contact 192.168.141.50:10008 in 2m44.398907974s
2019/05/21 06:35:24 [ERR] raft: Failed to AppendEntries to 192.168.141.50:10008: dial tcp 192.168.141.50:10008: connect: connection refused
2019/05/21 06:35:24 [DEBUG] raft: Failed to contact 192.168.141.50:10008 in 2m44.827295594s
2019/05/21 06:35:25 [DEBUG] raft: Failed to contact 192.168.141.50:10008 in 2m45.287224053s
2019/05/21 06:35:25 [DEBUG] raft: Failed to contact 192.168.141.50:10008 in 2m45.784114874s
2019/05/21 06:35:26 [DEBUG] raft: Failed to contact 192.168.141.50:10008 in 2m46.237884572s
2019/05/21 06:35:26 [DEBUG] raft: Failed to contact 192.168.141.50:10008 in 2m46.712580539s
2019/05/21 06:35:27 [DEBUG] raft: Failed to contact 192.168.141.50:10008 in 2m47.194846896s
2019/05/21 06:35:27 [DEBUG] raft: Failed to contact 192.168.141.50:10008 in 2m47.683998662s
2019/05/21 06:35:27 [ERR] raft: Failed to heartbeat to 192.168.141.50:10008: dial tcp 192.168.141.50:10008: connect: connection refused
2019/05/21 06:35:28 [DEBUG] raft: Failed to contact 192.168.141.50:10008 in 2m48.129312448s
2019-05-21 06:35:28 DEBUG raft leader is 192.168.151.228:10008 (this host); state: Leader
2019-05-21 06:35:28 DEBUG orchestrator/raft: applying command 47470: request-health-report
[martini] Started GET /api/raft-follower-health-report/27161245/192.168.151.228/192.168.151.228 for 192.168.151.228:35050
[martini] Completed 200 OK in 746.631µs
[martini] Started GET /api/raft-follower-health-report/27161245/192.168.131.22/192.168.131.22 for 192.168.131.22:36188
[martini] Completed 200 OK in 2.344773ms
2019/05/21 06:35:28 [DEBUG] raft: Failed to contact 192.168.141.50:10008 in 2m48.599153437s
2019/05/21 06:35:29 [DEBUG] raft: Failed to contact 192.168.141.50:10008 in 2m49.060463359s
2019/05/21 06:35:29 [DEBUG] raft: Failed to contact 192.168.141.50:10008 in 2m49.525100293s
2019/05/21 06:35:30 [DEBUG] raft: Failed to contact 192.168.141.50:10008 in 2m49.979100576s
2019/05/21 06:35:30 [DEBUG] raft: Failed to contact 192.168.141.50:10008 in 2m50.474263975s
2019/05/21 06:35:30 [DEBUG] raft: Failed to contact 192.168.141.50:10008 in 2m50.951412858s
2019/05/21 06:35:31 [DEBUG] raft: Failed to contact 192.168.141.50:10008 in 2m51.425738384s

hellracer commented 5 years ago

Ok this is what i found and it's very hard to believed that might track this issue, even re-election occurs whenever 192.168.141.50 node is the leader the cluster is always there but if the other node win the election during bootstrap/initial setup or even re-election it's always cluster not found therefor not working

node01 192.168.141.50 node02 192.168.131.22 node03 192.168.151.228

orchestrator 3.0.14 (commit f8d8ea9db20c6106dc697042fa37be3f3cb612e8) go version (go1.10.8.linux-amd64.tar.gz) Ubuntu 16.04

hellracer commented 5 years ago

Ok found the issue, sqlite database from 2 other node just have table definition on each of them but no suitable entries found on database_instance table;

the question is why orchestrator didn't bail out when it tries to write something on the table if it has an error.... this problem could stems to apparmor is forbidding writing on the flat db or something..

root@node02:/opt/orchestrator/data# sqlite3 orchestrator.sqlite3
SQLite version 3.11.0 2016-02-15 17:29:24
Enter ".help" for usage hints.
sqlite> select * from database_instance;
sqlite>

anyway as a temporary fix on my end, I replace those two VM that exhibiting odd behaviour remove apparmor on both VM and voila.... it's working now as advertise.

@shlomi-noach please close this if you don't feel that in needs further investigation

shlomi-noach commented 5 years ago

@hellracer well that's also strange, because orchestrator does bail out if unable to write to its own backend DB for some 30sec (or was it 60sec? Whichever).

hellracer commented 5 years ago

@shlomi-noach yeah i know at first i'm hardly find it to believe that also the case because orchestrator was able to create all of it's table definition in sqlite db on both nodes therefore orchestrator clearly can write in the filesystem.

But what's is not clear is orchestrator for some reason didn't put any entries on database_instance table that result to no cluster found issue on both follower nodes that's why i even ask earlier if they need to be shared via NFS heheh.

Anyway i'm going to close this for now, because some user might think that this feature it's not working mere to the fact that it was only in my setup that this strange behaviour has been observed.

openark / orchestrator

3-node orchestrator in raft mode question #892