pingcap / tidb-ansible

Apache License 2.0
326 stars 276 forks source link

新增tidb节点报错 #491

Closed caisanpx closed 6 years ago

caisanpx commented 6 years ago

已经在inventory.ini 的[tidb_servers] 和[monitored_servers] 添加192.168.11.167 tidb-ansible]$ ansible-playbook deploy.yml -l 192.168.11.167 报错信息如下 fatal: [192.168.11.167]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: 'ansible.vars.hostvars.HostVarsVars object' has no attribute 'ansible_hostname'\n\nThe error appears to have been in '/home/tidb/tidb-ansible/roles/check_system_dynamic/tasks/main.yml': line 13, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Preflight check - Get hostnames of all nodes in cluster\n ^ here\n"}。 请问这个问题如何解决,谢谢!

LinuxGit commented 6 years ago

Could you provide your ansible version ( ansible --version) and post the content of your inventory.ini?

caisanpx commented 6 years ago

tidb@192.168.11.171 tidb-ansible]$ ansible --version ansible 2.6.2 config file = /home/tidb/tidb-ansible/ansible.cfg configured module search path = [u'/home/tidb/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules'] ansible python module location = /usr/lib/python2.7/site-packages/ansible executable location = /bin/ansible python version = 2.7.5 (default, Nov 6 2016, 00:28:07) [GCC 4.8.5 20150623 (Red Hat 4.8.5-11)]

caisanpx commented 6 years ago

inventory.ini

caisanpx commented 6 years ago

TiDB Cluster Part

[tidb_servers] 192.168.11.172 192.168.11.173 192.168.11.167#新增tidb

[tikv_servers] TiKV1-1 ansible_host=192.168.11.174 deploy_dir=/data1/deploy tikv_port=20171 labels="host=tikv1" TiKV2-2 ansible_host=192.168.11.175 deploy_dir=/data1/deploy tikv_port=20171 labels="host=tikv2" TiKV3-3 ansible_host=192.168.11.176 deploy_dir=/data1/deploy tikv_port=20171 labels="host=tikv3"

[pd_servers] 192.168.11.171 192.168.11.172 192.168.11.173

[spark_master] 192.168.11.172 [spark_slaves] 192.168.11.173

Monitoring Part

prometheus and pushgateway servers

[monitoring_servers] 192.168.11.171

[grafana_servers] 192.168.11.171

node_exporter and blackbox_exporter servers

[monitored_servers] 192.168.11.171
192.168.11.172
192.168.11.173
192.168.11.174
192.168.11.175
192.168.11.176 192.168.11.167#新增tidb

[alertmanager_servers] 192.168.11.171

[kafka_exporter_servers]

Binlog Part

[pump_servers:children] tidb_servers

[drainer_servers]

Group variables

[pd_servers:vars]

location_labels = ["zone","rack","host"]

Global variables

[all:vars] deploy_dir = /data1/deploy

Connection

ssh via normal user

ansible_user = tidb

cluster_name = tidb-cluster

tidb_version = v2.0.6

process supervision, [systemd, supervise]

process_supervision = systemd

timezone of deployment region

timezone = Asia/Shanghai set_timezone = True

enable_firewalld = False

check NTP service

enable_ntpd = False set_hostname = False

CPU, memory and disk performance will not be checked when dev_mode = True

dev_mode = True

binlog trigger

enable_binlog = False

zookeeper address of kafka cluster for binlog, example:

zookeeper_addrs = "192.168.0.11:2181,192.168.0.12:2181,192.168.0.13:2181"

zookeeper_addrs = ""

kafka cluster address for monitoring, example:

kafka_addrs = "192.168.0.11:9092,192.168.0.12:9092,192.168.0.13:9092"

kafka_addrs = ""

store slow query log into seperate file

enable_slow_query_log = False

enable TLS authentication in the TiDB cluster

enable_tls = False

KV mode

deploy_without_tidb = False

Optional: Set if you already have a alertmanager server.

Format: alertmanager_host:alertmanager_port

alertmanager_target = ""

grafana_admin_user = "admin" grafana_admin_password = "admin"

Collect diagnosis

collect_log_recent_hours = 2

enable_bandwidth_limit = False

default: 10Mb/s, unit: Kbit/s

collect_bandwidth_limit = 10000

LinuxGit commented 6 years ago
  1. Make sure you install ansible via pip, see https://github.com/pingcap/docs-cn/blob/master/op-guide/ansible-deployment.md#在中控机器上安装-ansible-及其依赖.

    $ cd /home/tidb/tidb-ansible
    $ sudo pip install -r ./requirements.txt
  2. run following command:

    $ cd /home/tidb/tidb-ansible
    $ ansible 172.16.10.72 -m setup -a 'gather_subset=hardware' | grep hostname
        "ansible_hostname": "ip-172-16-10-72",

    replace 172.16.10.72 to 192.168.11.167

comment the output. Thanks.

caisanpx commented 6 years ago

谢谢你的回复。 1) [tidb@192.168.11.171 tidb-ansible]$ sudo pip install -r ./requirements.txt Requirement already satisfied (use --upgrade to upgrade): ansible>=2.4.2 in /usr/lib/python2.7/site-packages (from -r ./requirements.txt (line 1)) Requirement already satisfied (use --upgrade to upgrade): jinja2>=2.9.6 in /usr/lib64/python2.7/site-packages (from -r ./requirements.txt (line 2)) Requirement already satisfied (use --upgrade to upgrade): jmespath>=0.9.0 in /usr/lib/python2.7/site-packages (from -r ./requirements.txt (line 3)) Requirement already satisfied (use --upgrade to upgrade): PyYAML in /usr/lib64/python2.7/site-packages (from ansible>=2.4.2->-r ./requirements.txt (line 1)) Requirement already satisfied (use --upgrade to upgrade): paramiko in /usr/lib/python2.7/site-packages (from ansible>=2.4.2->-r ./requirements.txt (line 1)) Requirement already satisfied (use --upgrade to upgrade): cryptography in /usr/lib64/python2.7/site-packages (from ansible>=2.4.2->-r ./requirements.txt (line 1)) Requirement already satisfied (use --upgrade to upgrade): setuptools in /usr/lib/python2.7/site-packages (from ansible>=2.4.2->-r ./requirements.txt (line 1)) Requirement already satisfied (use --upgrade to upgrade): MarkupSafe>=0.23 in /usr/lib/python2.7/site-packages (from jinja2>=2.9.6->-r ./requirements.txt (line 2)) Requirement already satisfied (use --upgrade to upgrade): pyasn1>=0.1.7 in /usr/lib/python2.7/site-packages (from paramiko->ansible>=2.4.2->-r ./requirements.txt (line 1)) Requirement already satisfied (use --upgrade to upgrade): bcrypt>=3.1.3 in /usr/lib64/python2.7/site-packages (from paramiko->ansible>=2.4.2->-r ./requirements.txt (line 1)) Requirement already satisfied (use --upgrade to upgrade): pynacl>=1.0.1 in /usr/lib64/python2.7/site-packages (from paramiko->ansible>=2.4.2->-r ./requirements.txt (line 1)) Requirement already satisfied (use --upgrade to upgrade): idna>=2.1 in /usr/lib/python2.7/site-packages (from cryptography->ansible>=2.4.2->-r ./requirements.txt (line 1)) Requirement already satisfied (use --upgrade to upgrade): enum34; python_version < "3" in /usr/lib/python2.7/site-packages (from cryptography->ansible>=2.4.2->-r ./requirements.txt (line 1)) Requirement already satisfied (use --upgrade to upgrade): cffi!=1.11.3,>=1.7 in /usr/lib64/python2.7/site-packages (from cryptography->ansible>=2.4.2->-r ./requirements.txt (line 1)) Requirement already satisfied (use --upgrade to upgrade): six>=1.4.1 in /usr/lib/python2.7/site-packages (from cryptography->ansible>=2.4.2->-r ./requirements.txt (line 1)) Requirement already satisfied (use --upgrade to upgrade): ipaddress; python_version < "3" in /usr/lib/python2.7/site-packages (from cryptography->ansible>=2.4.2->-r ./requirements.txt (line 1)) Requirement already satisfied (use --upgrade to upgrade): asn1crypto>=0.21.0 in /usr/lib/python2.7/site-packages (from cryptography->ansible>=2.4.2->-r ./requirements.txt (line 1)) Requirement already satisfied (use --upgrade to upgrade): pycparser in /usr/lib/python2.7/site-packages (from cffi!=1.11.3,>=1.7->cryptography->ansible>=2.4.2->-r ./requirements.txt (line 1)) You are using pip version 8.1.2, however version 18.0 is available. You should consider upgrading via the 'pip install --upgrade pip' command. 请问我的pip version 是8.1.2,一定要升级到18.0吗?我用8.1.2 安装开始的集群的时候也能成功,现在想测试加1个新的TIDB(192.168.11.167)。根据文档步骤现在: ansible-playbook bootstrap.yml -l 192.168.11.167 后显示: PLAY RECAP 192.168.11.167 : ok=27 changed=4 unreachable=0 failed=0
Congrats! All goes well. :-)

接着执行部署: ansible-playbook deploy.yml -l 192.168.11.167 ASK [check_system_dynamic : Preflight check - Get hostnames of all nodes in cluster] *** fatal: [192.168.11.167]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: 'ansible.vars.hostvars.HostVarsVars object' has no attribute 'ansible_hostname'\n\nThe error appears to have been in '/home/tidb/tidb-ansible/roles/check_system_dynamic/tasks/main.yml': line 13, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Preflight check - Get hostnames of all nodes in cluster\n ^ here\n"}

2)根据您在第2点的提示: [tidb@192.168.11.171 tidb-ansible]$ pwd /home/tidb/tidb-ansible [tidb@192.168.11.171 tidb-ansible]$ ansible 192.168.11.167 -m setup -a 'gather_subset=hardware' | grep hostname "ansible_hostname": "ip-192-168-11-167" grep: ansible_hostname:: No such file or directory grep: ip-192-168-11-167: No such file or directory [WARNING]: Failure using method (v2_runner_on_ok) in callback plugin (<ansible.plugins.callback.minimal.CallbackModule object at 0x300c050>): [Errno 32] Broken pipe 谢谢您的回复。

LinuxGit commented 6 years ago
caisanpx commented 6 years ago

[tidb@XXX011171 tidb-ansible]$ ansible 192.168.11.167 -m setup -a 'gather_subset=hardware' | grep hostname "ansible_hostname": XXX011171",

LinuxGit commented 6 years ago

It seems that there's no problems. Please confirm you have get facts task in https://github.com/pingcap/tidb-ansible/blob/master/roles/check_system_dynamic/tasks/main.yml#L9, and haven't modified any playbook or ansible.cfg.

You could run ansible-playbook deploy.yml -l 192.168.11.167 -vv, and comment the output of name: get facts and name: Preflight check - Get hostnames of all nodes in cluster task.

caisanpx commented 6 years ago

ansible-playbook deploy.yml -l 192.168.11.167 -vv PLAYBOOK: deploy.yml **** 16 plays in deploy.yml

PLAY [check config locally] ***** skipping: no hosts matched

PLAY [check system environment] ***** META: ran handlers

TASK [check_system_dynamic : Disk space check - Fail task when disk is full] **** task path: /home/tidb/tidb-ansible/roles/check_system_dynamic/tasks/main.yml:3 ok: [192.168.11.167] => {"changed": false, "failed_when_result": false, "rc": 0, "stderr": "Shared connection to 192.168.11.167 closed.\r\n", "stderr_lines": ["Shared connection to 192.168.11.167 closed."], "stdout": "/dev/mapper/cl-root 26G 2.5G 24G 10% /\r\n", "stdout_lines": ["/dev/mapper/cl-root 26G 2.5G 24G 10% /"]}

TASK [check_system_dynamic : get facts] ***** task path: /home/tidb/tidb-ansible/roles/check_system_dynamic/tasks/main.yml:9 ok: [192.168.11.167]

TASK [check_system_dynamic : Preflight check - Get hostnames of all nodes in cluster] *** task path: /home/tidb/tidb-ansible/roles/check_system_dynamic/tasks/main.yml:13 fatal: [192.168.11.167]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: 'ansible.vars.hostvars.HostVarsVars object' has no attribute 'ansible_hostname'\n\nThe error appears to have been in '/home/tidb/tidb-ansible/roles/check_system_dynamic/tasks/main.yml': line 13, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Preflight check - Get hostnames of all nodes in cluster\n ^ here\n"}

caisanpx commented 6 years ago

请问这个是部署前的检测吧?能有方法跳过这个错误吗?谢谢!

LinuxGit commented 6 years ago

You could comment this task: https://github.com/pingcap/tidb-ansible/blob/master/deploy.yml#L25 And could you send email to support@pingcap.com, we could use teamviewer to help you resolve the issue.

caisanpx commented 6 years ago

我把home/tidb/tidb-ansible/roles/check_system_dynamic/tasks/main.yml 里面关于ansible_hostname 的内容都注释掉了,跳过检测。已经把新的tidb加进去了。谢谢你的回复!

LinuxGit commented 6 years ago

@caisanpx Thanks for your feedback, the bug has fixed: https://github.com/pingcap/tidb-ansible/pull/502.