openark / orchestrator

MySQL replication topology management and HA
Apache License 2.0
5.64k stars 933 forks source link

raft: Failed to contact 127.0.1.1:10008 #871

Closed git001 closed 5 years ago

git001 commented 5 years ago

I use 2019-05-03 08:15:27 INFO starting orchestrator, version: 3.0.14, git commit: f4c69ad05010518da784ce61865e65f0d9e0081c with the following config file

{
    "Debug": true,
    "EnableSyslog": false,
    "ListenAddress": ":3000",
    "AutoPseudoGTID": true,
    "RaftEnabled": true,
    "RaftDataDir": "/var/lib/orchestrator",
    "RaftBind": "165.22.75.137",
    "RaftNodes": ["mysql-001.livesystem.at", "mysql-002.livesystem.at", "mysql-003.livesystem.at"] ,
    "BackendDB": "sqlite",
    "SQLite3DataFile": "/var/lib/orchestrator/data/orchestrator.sqlite3",
    "MySQLTopologyCredentialsConfigFile": "/var/lib/orchestrator/orchestrator-topology.cnf",
    "InstancePollSeconds": 5,
    "DiscoverByShowSlaveHosts": false,
    "FailureDetectionPeriodBlockMinutes": 60,
    "UseSSL": true,
    "SSLPrivateKeyFile": "/var/lib/orchestrator/pki/mysql-001.livesystem.at_privatekey.pem",
    "SSLCertFile": "/var/lib/orchestrator/pki/mysql-001.livesystem.at_cert.pem",
    "SSLCAFile": "/var/lib/orchestrator/pki/ca_cert.pem"
  }

When I start the orchestrator the IP apears in the peers but I haven't setuped this ip 127.0.1.1:10008 in any config

Where is this IP defined?

root@mysql-001:/usr/local/orchestrator# cd /usr/local/orchestrator && orchestrator --debug --config=/var/lib/orchestrator/orchestrator-sqlite.conf.json --stack http
2019-05-03 08:39:31 INFO starting orchestrator, version: 3.0.14, git commit: f4c69ad05010518da784ce61865e65f0d9e0081c
2019-05-03 08:39:31 INFO Read config: /var/lib/orchestrator/orchestrator-sqlite.conf.json
2019-05-03 08:39:31 DEBUG Parsed topology credentials from /var/lib/orchestrator/orchestrator-topology.cnf
2019-05-03 08:39:31 DEBUG Connected to orchestrator backend: sqlite on /var/lib/orchestrator/data/orchestrator.sqlite3
2019-05-03 08:39:31 DEBUG Initializing orchestrator
2019-05-03 08:39:31 INFO Connecting to backend :3306: maxConnections: 128, maxIdleConns: 32
2019-05-03 08:39:31 INFO Starting Discovery
2019-05-03 08:39:31 INFO Registering endpoints
2019-05-03 08:39:31 INFO continuous discovery: setting up
2019-05-03 08:39:31 DEBUG Setting up raft
2019-05-03 08:39:31 DEBUG Queue.startMonitoring(DEFAULT)
2019-05-03 08:39:31 DEBUG raft: advertise=165.22.75.137:10008
2019-05-03 08:39:31 DEBUG raft: transport=&{connPool:map[] connPoolLock:{state:0 sema:0} consumeCh:0xc42008a8a0 heartbeatFn:<nil> heartbeatFnLock:{state:0 sema:0} logger:0xc4200a7c20 maxPool:3 shutdown:false shutdownCh:0xc42008a900 shutdownLock:{state:0 sema:0} stream:0xc42016dc00 timeout:10000000000 TimeoutScale:262144}
2019-05-03 08:39:31 DEBUG raft: peers=[127.0.1.1:10008 178.128.203.16:10008 142.93.110.159:10008]
2019-05-03 08:39:31 DEBUG raft: logStore=&{dataDir:/var/lib/orchestrator backend:<nil>}
2019-05-03 08:39:31 INFO raft: store initialized at /var/lib/orchestrator/raft_store.db
2019-05-03 08:39:31 INFO Starting HTTPS listener
2019-05-03 08:39:31 INFO Read in CA file: /var/lib/orchestrator/pki/ca_cert.pem
2019-05-03 08:39:31 INFO new raft created
2019-05-03 08:39:31 INFO continuous discovery: starting
2019/05/03 08:39:31 [INFO] raft: Node at 165.22.75.137:10008 [Follower] entering Follower state (Leader: "")
2019-05-03 08:39:32 DEBUG Waiting for 15 seconds to pass before running failure detection/recovery
2019/05/03 08:39:33 [WARN] raft: Heartbeat timeout from "" reached, starting election
2019/05/03 08:39:33 [INFO] raft: Node at 165.22.75.137:10008 [Candidate] entering Candidate state
2019/05/03 08:39:33 [ERR] raft: Failed to make RequestVote RPC to 127.0.1.1:10008: dial tcp 127.0.1.1:10008: connect: connection refused
2019/05/03 08:39:33 [ERR] raft: Failed to make RequestVote RPC to 142.93.110.159:10008: dial tcp 142.93.110.159:10008: connect: connection refused
2019/05/03 08:39:33 [DEBUG] raft: Votes needed: 3
2019/05/03 08:39:33 [DEBUG] raft: Newer term discovered, fallback to follower
2019/05/03 08:39:33 [INFO] raft: Node at 165.22.75.137:10008 [Follower] entering Follower state (Leader: "")
2019-05-03 08:39:33 DEBUG Waiting for 15 seconds to pass before running failure detection/recovery
2019/05/03 08:39:34 [WARN] raft: Heartbeat timeout from "" reached, starting election
2019/05/03 08:39:34 [INFO] raft: Node at 165.22.75.137:10008 [Candidate] entering Candidate state
2019/05/03 08:39:34 [ERR] raft: Failed to make RequestVote RPC to 127.0.1.1:10008: dial tcp 127.0.1.1:10008: connect: connection refused
2019/05/03 08:39:34 [DEBUG] raft: Votes needed: 3
2019/05/03 08:39:34 [DEBUG] raft: Vote granted from 165.22.75.137:10008. Tally: 1
2019/05/03 08:39:34 [DEBUG] raft: Vote granted from 178.128.203.16:10008. Tally: 2
2019/05/03 08:39:34 [DEBUG] raft: Vote granted from 142.93.110.159:10008. Tally: 3
2019/05/03 08:39:34 [INFO] raft: Election won. Tally: 3
2019/05/03 08:39:34 [INFO] raft: Node at 165.22.75.137:10008 [Leader] entering Leader state
2019/05/03 08:39:34 [INFO] raft: pipelining replication to peer 178.128.203.16:10008
2019/05/03 08:39:34 [INFO] raft: pipelining replication to peer 142.93.110.159:10008
2019/05/03 08:39:34 [ERR] raft: Failed to AppendEntries to 127.0.1.1:10008: dial tcp 127.0.1.1:10008: connect: connection refused
2019/05/03 08:39:34 [ERR] raft: Failed to AppendEntries to 127.0.1.1:10008: dial tcp 127.0.1.1:10008: connect: connection refused
2019/05/03 08:39:34 [DEBUG] raft: Node 165.22.75.137:10008 updated peer set (2): [165.22.75.137:10008 127.0.1.1:10008 178.128.203.16:10008 142.93.110.159:10008]
2019-05-03 08:39:34 DEBUG orchestrator/raft: applying command 2: leader-uri
2019-05-03 08:39:34 DEBUG orchestrator/raft: applying command 3: request-health-report
2019/05/03 08:39:34 [DEBUG] raft: Node 165.22.75.137:10008 updated peer set (2): [178.128.203.16:10008 165.22.75.137:10008 127.0.1.1:10008 142.93.110.159:10008]
2019/05/03 08:39:34 [DEBUG] raft: Node 165.22.75.137:10008 updated peer set (2): [178.128.203.16:10008 165.22.75.137:10008 127.0.1.1:10008 142.93.110.159:10008]
2019/05/03 08:39:34 [DEBUG] raft: Node 165.22.75.137:10008 updated peer set (2): [142.93.110.159:10008 178.128.203.16:10008 165.22.75.137:10008 127.0.1.1:10008]
2019/05/03 08:39:34 [DEBUG] raft: Node 165.22.75.137:10008 updated peer set (2): [165.22.75.137:10008 142.93.110.159:10008 178.128.203.16:10008 127.0.1.1:10008]
2019/05/03 08:39:34 [DEBUG] raft: Node 165.22.75.137:10008 updated peer set (2): [165.22.75.137:10008 127.0.1.1:10008 178.128.203.16:10008 142.93.110.159:10008]
2019/05/03 08:39:34 http: TLS handshake error from 165.22.75.137:47266: remote error: tls: bad certificate
2019-05-03 08:39:34 DEBUG orchestrator/raft: applying command 4: request-health-report
2019-05-03 08:39:34 DEBUG orchestrator/raft: applying command 5: request-health-report
2019-05-03 08:39:34 DEBUG orchestrator/raft: applying command 6: request-health-report
2019-05-03 08:39:34 DEBUG orchestrator/raft: applying command 7: request-health-report
2019-05-03 08:39:34 DEBUG orchestrator/raft: applying command 8: heartbeat
2019-05-03 08:39:34 DEBUG orchestrator/raft: applying command 9: request-health-report
2019-05-03 08:39:34 DEBUG orchestrator/raft: applying command 10: request-health-report
2019-05-03 08:39:34 DEBUG orchestrator/raft: applying command 12: leader-uri
2019-05-03 08:39:34 DEBUG orchestrator/raft: applying command 13: request-health-report
2019-05-03 08:39:34 DEBUG orchestrator/raft: applying command 14: request-health-report
2019-05-03 08:39:34 DEBUG orchestrator/raft: applying command 16: leader-uri
2019-05-03 08:39:34 DEBUG orchestrator/raft: applying command 17: request-health-report
2019-05-03 08:39:34 DEBUG orchestrator/raft: applying command 18: heartbeat
2019-05-03 08:39:34 DEBUG orchestrator/raft: applying command 19: request-health-report
2019-05-03 08:39:34 DEBUG orchestrator/raft: applying command 20: request-health-report
2019-05-03 08:39:34 DEBUG orchestrator/raft: applying command 21: request-health-report
2019-05-03 08:39:34 DEBUG orchestrator/raft: applying command 22: request-health-report
2019-05-03 08:39:34 DEBUG orchestrator/raft: applying command 23: request-health-report
2019-05-03 08:39:34 DEBUG orchestrator/raft: applying command 24: heartbeat
2019-05-03 08:39:34 DEBUG orchestrator/raft: applying command 25: request-health-report
2019-05-03 08:39:34 DEBUG orchestrator/raft: applying command 26: request-health-report
2019-05-03 08:39:34 DEBUG orchestrator/raft: applying command 27: request-health-report
2019-05-03 08:39:34 DEBUG orchestrator/raft: applying command 28: request-health-report
2019-05-03 08:39:34 DEBUG orchestrator/raft: applying command 29: request-health-report
2019-05-03 08:39:34 DEBUG orchestrator/raft: applying command 30: request-health-report
2019-05-03 08:39:34 DEBUG orchestrator/raft: applying command 31: request-health-report
2019-05-03 08:39:34 DEBUG orchestrator/raft: applying command 32: heartbeat
2019-05-03 08:39:34 DEBUG orchestrator/raft: applying command 33: request-health-report
2019-05-03 08:39:34 DEBUG orchestrator/raft: applying command 34: request-health-report
2019-05-03 08:39:34 DEBUG orchestrator/raft: applying command 35: request-health-report
2019-05-03 08:39:34 DEBUG orchestrator/raft: applying command 37: leader-uri
2019-05-03 08:39:34 DEBUG orchestrator/raft: applying command 38: request-health-report
2019-05-03 08:39:34 DEBUG orchestrator/raft: applying command 39: request-health-report
2019-05-03 08:39:34 DEBUG orchestrator/raft: applying command 40: request-health-report
2019-05-03 08:39:34 DEBUG orchestrator/raft: applying command 42: leader-uri
2019-05-03 08:39:34 DEBUG orchestrator/raft: applying command 43: request-health-report
2019-05-03 08:39:34 DEBUG orchestrator/raft: applying command 44: heartbeat
2019-05-03 08:39:34 DEBUG orchestrator/raft: applying command 45: request-health-report
2019-05-03 08:39:34 DEBUG orchestrator/raft: applying command 46: request-health-report
2019-05-03 08:39:34 DEBUG orchestrator/raft: applying command 47: request-health-report
2019-05-03 08:39:34 DEBUG orchestrator/raft: applying command 48: request-health-report
2019-05-03 08:39:34 DEBUG orchestrator/raft: applying command 49: request-health-report
2019-05-03 08:39:34 DEBUG orchestrator/raft: applying command 50: heartbeat
2019-05-03 08:39:34 DEBUG orchestrator/raft: applying command 51: request-health-report
2019-05-03 08:39:34 DEBUG orchestrator/raft: applying command 52: request-health-report
2019-05-03 08:39:34 DEBUG orchestrator/raft: applying command 54: leader-uri
2019/05/03 08:39:34 [ERR] raft: Failed to heartbeat to 127.0.1.1:10008: dial tcp 127.0.1.1:10008: connect: connection refused
2019/05/03 08:39:34 [ERR] raft: Failed to AppendEntries to 127.0.1.1:10008: dial tcp 127.0.1.1:10008: connect: connection refused
2019/05/03 08:39:34 http: TLS handshake error from 178.128.203.16:50112: remote error: tls: bad certificate
2019/05/03 08:39:34 http: TLS handshake error from 142.93.110.159:60710: remote error: tls: bad certificate
2019/05/03 08:39:34 [ERR] raft: Failed to AppendEntries to 127.0.1.1:10008: dial tcp 127.0.1.1:10008: connect: connection refused
2019/05/03 08:39:34 [ERR] raft: Failed to AppendEntries to 127.0.1.1:10008: dial tcp 127.0.1.1:10008: connect: connection refused
2019/05/03 08:39:34 [ERR] raft: Failed to heartbeat to 127.0.1.1:10008: dial tcp 127.0.1.1:10008: connect: connection refused
2019-05-03 08:39:34 DEBUG Waiting for 15 seconds to pass before running failure detection/recovery
2019/05/03 08:39:34 [ERR] raft: Failed to AppendEntries to 127.0.1.1:10008: dial tcp 127.0.1.1:10008: connect: connection refused
2019/05/03 08:39:34 [ERR] raft: Failed to heartbeat to 127.0.1.1:10008: dial tcp 127.0.1.1:10008: connect: connection refused
2019/05/03 08:39:34 [WARN] raft: Failed to contact 127.0.1.1:10008 in 509.238189ms
2019/05/03 08:39:34 [ERR] raft: Failed to heartbeat to 127.0.1.1:10008: dial tcp 127.0.1.1:10008: connect: connection refused
2019/05/03 08:39:34 [ERR] raft: Failed to AppendEntries to 127.0.1.1:10008: dial tcp 127.0.1.1:10008: connect: connection refused
2019/05/03 08:39:35 [ERR] raft: Failed to heartbeat to 127.0.1.1:10008: dial tcp 127.0.1.1:10008: connect: connection refused
2019/05/03 08:39:35 [WARN] raft: Failed to contact 127.0.1.1:10008 in 956.936642ms
2019/05/03 08:39:35 [ERR] raft: Failed to AppendEntries to 127.0.1.1:10008: dial tcp 127.0.1.1:10008: connect: connection refused
2019/05/03 08:39:35 [ERR] raft: Failed to heartbeat to 127.0.1.1:10008: dial tcp 127.0.1.1:10008: connect: connection refused
2019-05-03 08:39:35 DEBUG Waiting for 15 seconds to pass before running failure detection/recovery
2019/05/03 08:39:35 [ERR] raft: Failed to heartbeat to 127.0.1.1:10008: dial tcp 127.0.1.1:10008: connect: connection refused
2019/05/03 08:39:35 [WARN] raft: Failed to contact 127.0.1.1:10008 in 1.414558423s
2019/05/03 08:39:36 [ERR] raft: Failed to AppendEntries to 127.0.1.1:10008: dial tcp 127.0.1.1:10008: connect: connection refused
2019/05/03 08:39:36 [ERR] raft: Failed to heartbeat to 127.0.1.1:10008: dial tcp 127.0.1.1:10008: connect: connection refused
2019/05/03 08:39:36 [DEBUG] raft: Failed to contact 127.0.1.1:10008 in 1.874982292s
2019/05/03 08:39:36 [DEBUG] raft: Failed to contact 127.0.1.1:10008 in 2.307844141s
2019-05-03 08:39:36 DEBUG Waiting for 15 seconds to pass before running failure detection/recovery
2019-05-03 08:39:36 DEBUG raft leader is 165.22.75.137:10008 (this host); state: Leader
2019/05/03 08:39:36 [ERR] raft: Failed to heartbeat to 127.0.1.1:10008: dial tcp 127.0.1.1:10008: connect: connection refused
2019/05/03 08:39:37 [DEBUG] raft: Failed to contact 127.0.1.1:10008 in 2.776174241s
2019/05/03 08:39:37 [ERR] raft: Failed to AppendEntries to 127.0.1.1:10008: dial tcp 127.0.1.1:10008: connect: connection refused
2019/05/03 08:39:37 [DEBUG] raft: Failed to contact 127.0.1.1:10008 in 3.229550204s
2019-05-03 08:39:37 DEBUG Waiting for 15 seconds to pass before running failure detection/recovery
2019/05/03 08:39:37 [DEBUG] raft: Failed to contact 127.0.1.1:10008 in 3.677283535s
2019/05/03 08:39:38 [DEBUG] raft: Failed to contact 127.0.1.1:10008 in 4.100539914s
2019/05/03 08:39:38 [ERR] raft: Failed to heartbeat to 127.0.1.1:10008: dial tcp 127.0.1.1:10008: connect: connection refused
2019-05-03 08:39:38 DEBUG Waiting for 15 seconds to pass before running failure detection/recovery
2019/05/03 08:39:38 [DEBUG] raft: Failed to contact 127.0.1.1:10008 in 4.582547477s
2019/05/03 08:39:39 [DEBUG] raft: Failed to contact 127.0.1.1:10008 in 5.065952894s
2019-05-03 08:39:39 DEBUG Waiting for 15 seconds to pass before running failure detection/recovery
2019/05/03 08:39:39 [DEBUG] raft: Failed to contact 127.0.1.1:10008 in 5.539546906s
2019/05/03 08:39:40 [ERR] raft: Failed to AppendEntries to 127.0.1.1:10008: dial tcp 127.0.1.1:10008: connect: connection refused
2019/05/03 08:39:40 [DEBUG] raft: Failed to contact 127.0.1.1:10008 in 6.011239143s
2019-05-03 08:39:40 DEBUG Waiting for 15 seconds to pass before running failure detection/recovery
2019/05/03 08:39:40 [DEBUG] raft: Failed to contact 127.0.1.1:10008 in 6.486905862s
2019/05/03 08:39:41 [ERR] raft: Failed to heartbeat to 127.0.1.1:10008: dial tcp 127.0.1.1:10008: connect: connection refused
2019/05/03 08:39:41 [DEBUG] raft: Failed to contact 127.0.1.1:10008 in 6.929725794s
2019-05-03 08:39:41 DEBUG Waiting for 15 seconds to pass before running failure detection/recovery
2019/05/03 08:39:41 [DEBUG] raft: Failed to contact 127.0.1.1:10008 in 7.359897156s
2019-05-03 08:39:41 DEBUG raft leader is 165.22.75.137:10008 (this host); state: Leader
2019-05-03 08:39:41 DEBUG orchestrator/raft: applying command 55: request-health-report
2019/05/03 08:39:41 http: TLS handshake error from 165.22.75.137:47312: remote error: tls: bad certificate
2019/05/03 08:39:41 http: TLS handshake error from 178.128.203.16:50114: remote error: tls: bad certificate
2019/05/03 08:39:41 http: TLS handshake error from 142.93.110.159:60712: remote error: tls: bad certificate
2019/05/03 08:39:42 [DEBUG] raft: Failed to contact 127.0.1.1:10008 in 7.838900395s
2019/05/03 08:39:42 [DEBUG] raft: Failed to contact 127.0.1.1:10008 in 8.264737721s
2019-05-03 08:39:42 DEBUG Waiting for 15 seconds to pass before running failure detection/recovery
liuqian1990 commented 5 years ago

@git001 mysql-001.livesystem.at ping mysql-001.livesystem.at test

shlomi-noach commented 5 years ago

Perhaps an entry in /etc/resolve.conf?

try running

$ host mysql-001.livesystem.at 
$ host mysql-002.livesystem.at 
$ host mysql-003.livesystem.at 

from the same boxes where orchestrator is running.

git001 commented 5 years ago

thanks all.
The links below helped me to find the reason and the solution.

reason

https://www.debian.org/doc/manuals/debian-reference/ch05.en.html#_the_hostname_resolution

solution

https://www.digitalocean.com/community/tutorials/how-to-use-cloud-config-for-your-initial-server-setup

Ansible snipplet.

  - name: DO | Create droplets
    digital_ocean:
      state: present
      command: droplet
      name: "{{ item }}.{{ Domain }}"
      api_token: "{{ DOToken }}"
      ssh_key_ids: SSH_IDS
      size_id: s-4vcpu-8gb
      region_id: fra1
      image_id: ubuntu-18-04-x64
      unique_name: yes
      wait_timeout: 500
      private_networking: yes
      user_data: |
        #cloud-config
        resolv_conf:
          nameservers:
            - 'IP.of.your.prefered.dnsserver'
        manage_etc_hosts: False
    loop:
      - mysql-001
      - mysql-002
      - mysql-003
    register: my_droplet

output of the diagnose / debug commands.

root@mysql-001:~# ping mysql-001.livesystem.at
PING mysql-001.livesystem.at (127.0.1.1) 56(84) bytes of data.
64 bytes from mysql-001.livesystem.at (127.0.1.1): icmp_seq=1 ttl=64 time=0.035 ms
64 bytes from mysql-001.livesystem.at (127.0.1.1): icmp_seq=2 ttl=64 time=0.038 ms
64 bytes from mysql-001.livesystem.at (127.0.1.1): icmp_seq=3 ttl=64 time=0.041 ms
^C
--- mysql-001.livesystem.at ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2044ms
rtt min/avg/max/mdev = 0.035/0.038/0.041/0.002 ms
root@mysql-001:~# systemd-resolve mysql-001.livesystem.at
mysql-001.livesystem.at: 127.0.1.1

-- Information acquired via protocol DNS in 1.1ms.
-- Data is authenticated: yes