Closed chenjacken closed 1 year ago
@chenjacken 这个问题我昨天修复了,拉取最新的 ocboot release/3.10 代码再重新执行试试?
@chenjacken 麻烦把 ./config-k8s-ha.yml 里面的内容贴下
config-k8s-ha.yml内容,参考:https://www.cloudpods.org/zh/docs/setup/ha-ce/
primary_master_node:
hostname: 172.16.1.8
use_local: false
user: root
onecloud_version: "v3.10.6"
db_host: 172.16.1.99
db_user: "root"
db_password: "hwyDB_@2024"
db_port: "3306"
skip_docker_config: true
image_repository: registry.cn-guangzhou.aliyuncs.com/createview
ha_using_local_registry: false
node_ip: "172.16.1.8"
ip_autodetection_method: "can-reach=172.16.1.8"
controlplane_host: 172.16.1.100
controlplane_port: "6443"
as_host: true
high_availability: true
use_ee: false
enable_minio: true
registry_mirrors:
- https://lje6zxpk.mirror.aliyuncs.com
insecure_registries:
- 172.16.1.8:5000
host_networks: "eno1/br0/172.16.1.8"
master_nodes:
controlplane_host: 172.16.1.100
controlplane_port: "6443"
as_controller: true
as_host: true
ntpd_server: "172.16.1.8"
registry_mirrors:
- https://lje6zxpk.mirror.aliyuncs.com
high_availability: true
hosts:
- user: root
hostname: "172.16.1.9"
host_networks: "eno1/br0/172.16.1.9"
- user: root
hostname: "172.16.1.10"
host_networks: "eno1/br0/172.16.1.10"
@chenjacken 这个问题我昨天修复了,拉取最新的 ocboot release/3.10 代码再重新执行试试?
就是刚才拉最新的版本
# 下载 ocboot 工具到本地
$ git clone -b release/3.10 https://github.com/yunionio/ocboot && cd ./ocboot
@chenjacken 看下ansible的版本
@chenjacken 看起来像是yaml的配置问题。 请贴一下 ./config-k8s-ha.yml 内容,(删掉 ip、密码等)
@chenjacken 看下ansible的版本
根据官方文档安装:
# 本地安装 ansible 和 git
$ yum install -y epel-release git python3-pip
$ python3 -m pip install --upgrade pip setuptools wheel
$ python3 -m pip install --upgrade ansible
ansible的版本:
[root@master1 ~]# ansible --version
[DEPRECATION WARNING]: Ansible will require Python 3.8 or newer on the controller starting with Ansible 2.12.
Current version: 3.6.8 (default, Jun 20 2023, 11:53:23) [GCC 4.8.5 20150623 (Red Hat 4.8.5-44)]. This feature
will be removed from ansible-core in version 2.12. Deprecation warnings can be disabled by setting
deprecation_warnings=False in ansible.cfg.
/usr/local/lib/python3.6/site-packages/ansible/parsing/vault/__init__.py:44: CryptographyDeprecationWarning: Python 3.6 is no longer supported by the Python core team. Therefore, support for it is deprecated in cryptography. The next release of cryptography will remove support for Python 3.6.
from cryptography.exceptions import InvalidSignature
ansible [core 2.11.12]
config file = None
configured module search path = ['/root/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
ansible python module location = /usr/local/lib/python3.6/site-packages/ansible
ansible collection location = /root/.ansible/collections:/usr/share/ansible/collections
executable location = /usr/local/bin/ansible
python version = 3.6.8 (default, Jun 20 2023, 11:53:23) [GCC 4.8.5 20150623 (Red Hat 4.8.5-44)]
jinja version = 3.0.3
libyaml = True
@chenjacken 看起来像是yaml的配置问题。 请贴一下 ./config-k8s-ha.yml 内容,(删掉 ip、密码等) 上面有贴出来。 https://github.com/yunionio/cloudpods/issues/18357#issuecomment-1769804759
@chenjacken 刚提交代码修复了,再用最新的 ocboot release/3.10 分支代码测试一下。
谢谢!
我再测试下。
@chenjacken https://github.com/yunionio/ocboot/pull/990/files 刚才还解决了一个语法问题,如果遇到报错再更新下代码
已经没遇到这个问题了
RUNNING HANDLER [utils/config-network-manager : Reload NetworkManager] *********
changed: [172.16.1.10]
changed: [172.16.1.9]
changed: [172.16.1.8]
RUNNING HANDLER [utils/config-network-manager : Remove immutable flag on /etc/resolv.conf] ***
changed: [172.16.1.10]
changed: [172.16.1.9]
changed: [172.16.1.8]
[WARNING]: Could not match supplied host pattern, ignoring: mariadb_node
PLAY [mariadb_node] ************************************************************
skipping: no hosts matched
[WARNING]: Could not match supplied host pattern, ignoring: mariadb_ha_nodes
PLAY [mariadb_ha_nodes] ********************************************************
skipping: no hosts matched
[WARNING]: Could not match supplied host pattern, ignoring: clickhouse_node
PLAY [clickhouse_node] *********************************************************
skipping: no hosts matched
[WARNING]: Could not match supplied host pattern, ignoring: registry_node
PLAY [registry_node] ***********************************************************
skipping: no hosts matched
PLAY [primary_master_node] *****************************************************
可以通过脚本直接部署数据库吗? https://www.cloudpods.org/zh/docs/setup/ha-ce/ 这个文档是需要手工部署高可用数据库。 https://github.com/yunionio/ocboot/blob/release/3.10/README.md 这里说明可以通过配置好,直接脚本安装数据库高可用。
是不是安装脚本没执行安装keepalived nc,脚本安装完之后VIP不生效了 手工进行安装yum install -y keepalived nc,然后重启,VIP可以访问了。
@chenjacken 是可以通过脚本部署高可用 mariadb ,但目前这个是双主模式的部署,已经不是 mariadb 官方推荐的方式了。之后我们计划改成3节点集群模式的,所以没有写到 cloudpods.org 文档里面,这个高可用数据库我们还是建议用户自己管理维护。
是不是安装脚本没执行安装keepalived nc,脚本安装完之后VIP不生效了 手工进行安装yum install -y keepalived nc,然后重启,VIP可以访问了。
@chenjacken keepalived 是启动在容器里面的,不需要额外安装,可以看下每个节点 docker ps -a | grep keepalived
容器的日志
是不是安装脚本没执行安装keepalived nc,脚本安装完之后VIP不生效了 手工进行安装yum install -y keepalived nc,然后重启,VIP可以访问了。
@chenjacken keepalived 是启动在容器里面的,不需要额外安装,可以看下每个节点
docker ps -a | grep keepalived
容器的日志
明白了。谢谢!
高可用部署完后 ,添加计算节点报错:
命令./ocboot.py add-node 172.16.1.8 172.16.1.5
,错误信息:
TASK [utils/kernel-check : Is Cloud kernel running] ****************************
ok: [172.16.1.5]
TASK [utils/kernel-check : Is cloud kernel installed] **************************
ok: [172.16.1.5]
TASK [utils/kernel-check : install customized kernel] **************************
included: /opt/hwcloud-ocboot/onecloud/roles/utils/kernel-check/tasks/centos-x86_64.yml for 172.16.1.5
TASK [utils/kernel-check : version test] ***************************************
fatal: [172.16.1.5]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: 'ansible_python_interpreter' is undefined\n\nThe error appears to be in '/opt/hwcloud-ocboot/onecloud/roles/utils/kernel-check/tasks/centos-x86_64.yml': line 2, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n# This role contains common plays that will run on all nodes\n- name: version test\n ^ here\n"}
RUNNING HANDLER [utils/config-network-manager : Reload NetworkManager] *********
PLAY RECAP *********************************************************************
172.16.1.5 : ok=99 changed=45 unreachable=0 failed=1 skipped=27 rescued=0 ignored=0
谢谢! @zexi @zhasm
@chenjacken 多谢,我看一下
高可用部署完后 ,添加计算节点报错:undefined variable. The error was: 'ansible_python_interpreter' is undefined
的问题。
命令
./ocboot.py add-node 172.16.1.8 172.16.1.5
,错误信息:TASK [utils/kernel-check : Is Cloud kernel running] **************************** ok: [172.16.1.5] TASK [utils/kernel-check : Is cloud kernel installed] ************************** ok: [172.16.1.5] TASK [utils/kernel-check : install customized kernel] ************************** included: /opt/hwcloud-ocboot/onecloud/roles/utils/kernel-check/tasks/centos-x86_64.yml for 172.16.1.5 TASK [utils/kernel-check : version test] *************************************** fatal: [172.16.1.5]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: 'ansible_python_interpreter' is undefined\n\nThe error appears to be in '/opt/hwcloud-ocboot/onecloud/roles/utils/kernel-check/tasks/centos-x86_64.yml': line 2, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n# This role contains common plays that will run on all nodes\n- name: version test\n ^ here\n"} RUNNING HANDLER [utils/config-network-manager : Reload NetworkManager] ********* PLAY RECAP ********************************************************************* 172.16.1.5 : ok=99 changed=45 unreachable=0 failed=1 skipped=27 rescued=0 ignored=0
谢谢! @zexi @zhasm
@chenjacken 已经修复。请拉取最新代码。
https://github.com/yunionio/ocboot/pull/999 @chenjacken 这个尝试修复此问题,请再拉取代码试下
yunionio/ocboot#999 @chenjacken 这个尝试修复此问题,请再拉取代码试下
现在才看到。昨晚出现问题的场景是:我多开几个SSH窗口,同时执行添加计算节点,有一个成功添加,其他SSH窗口的就报如上的错误。然后昨晚我的解决方案是:不同时进行执行添加计算节点,一个一个执行,添加完一个再执行添加另外一个。
@chenjacken 可以跟多个 ip,批量添加:
usage: ocboot.py add-node [-h] [--user SSH_USER] [--key-file SSH_PRIVATE_FILE]
[--port SSH_PORT] [--node-port SSH_NODE_PORT]
[--enable-host-on-vm]
FIRST_MASTER_HOST TARGET_NODE_HOSTS
[TARGET_NODE_HOSTS ...]
例如,
python3 ocboot.py add-node <ip1> <ip2> <ip3>
@chenjacken 可以跟多个 ip,批量添加:
usage: ocboot.py add-node [-h] [--user SSH_USER] [--key-file SSH_PRIVATE_FILE] [--port SSH_PORT] [--node-port SSH_NODE_PORT] [--enable-host-on-vm] FIRST_MASTER_HOST TARGET_NODE_HOSTS [TARGET_NODE_HOSTS ...]
例如,
python3 ocboot.py add-node <ip1> <ip2> <ip3>
明白了,谢谢!!
1,版本: 操作系统版本:
cloudpod版本:v3.10.6
2,进行高可用部署:
./ocboot.py install ./config-k8s-ha.yml
,报错: