Closed vikaschoudhary16 closed 4 years ago
Looks like there may be a system container bug with the pull CLI image flow.
Can you turn on -vv and run through that step again?
On Apr 5, 2018, at 2:23 AM, Vikas Choudhary (vikasc) < notifications@github.com> wrote:
Description
Trying to install single node 3.10 cluster with pre-release version v3.10.0-0.14.0. Eventually openshift-ansible fails with error related to node label parsing.
Invalid value: "node-role.kubernetes.io http://node-role.kubernetes.io~1master"
It fails because oc tool that openshift-ansible produces is version 3.6. Why it is not 3.10?
containerized=true
ansible_ssh_user=root openshift_deployment_type=openshift-enterprise openshift_image_tag=v3.10.0-0.14.0 openshift_release='3.10' openshift_playbook_rpm_repos=[{'id': 'aos-playbook-rpm', 'name': 'aos-playbook-rpm', 'baseurl': 'https://mirror.openshift.com/enterprise/all/3.10/v3.10.0-0.14.0_2018-03-27.1/x86_64/os/', 'enabled': 1, 'gpgcheck': 0}]
INSTALLER STATUS
Initialization : Complete (0:00:08) Health Check : Complete (0:00:01) etcd Install : Complete (0:01:53) Master Install : Complete (0:03:28) Master Additional Install : Complete (0:00:30) Node Install : In Progress (0:02:10) This phase can be restarted by running: playbooks/openshift-node/config.yml
4.81s
Failure summary:
Also when i tried to manually label the node, showing same error:
[root@dell-r620-01 openshift-ansible]# oc label node mynode node-role.kubernetes.io/master=true region=infra zone=default --overwrite -n default The Node "mynode" is invalid: metadata.labels: Invalid value: "node-role.kubernetes.io http://node-role.kubernetes.io~1master": name part must consist of alphanumeric characters, '-', '' or '.', and must start and end with an alphanumeric character (e.g. 'MyName', or 'my.name http://my.name', or '123-abc', regex used for validation is '([A-Za-z0-9][-A-Za-z0-9.]*)?[A-Za-z0-9]')
Version
[root@dell-r620-01 openshift-ansible]# ansible --version ansible 2.4.3.0 config file = /home/vikas/openshift-ansible/ansible.cfg configured module search path = [u'/root/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules'] ansible python module location = /usr/lib/python2.7/site-packages/ansible executable location = /usr/bin/ansible python version = 2.7.5 (default, May 3 2017, 07:55:04) [GCC 4.8.5 20150623 (Red Hat 4.8.5-14)]
[root@dell-r620-01 openshift-ansible]# git describe openshift-ansible-3.10.0-0.15.0
If i try to label node with oc tool 3.10 version (built manually), it works.
/cc @smarterclayton https://github.com/smarterclayton @sdodson https://github.com/sdodson @sjenning https://github.com/sjenning
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/openshift/openshift-ansible/issues/7790, or mute the thread https://github.com/notifications/unsubscribe-auth/ABG_p2_ImBFzLTyIQPR0AvPau75KzFeHks5tlbhkgaJpZM4TH6jG .
@smarterclayton After using following in the ini file, there was no issue related to image tags:
oreg_url='registry.reg-aws.openshift.com:443/openshift3/ose-${component}:v3.10.0-0.14.0'
Next failure that i encountered was with node service:
● atomic-openshift-node.service
Loaded: loaded (/etc/systemd/system/atomic-openshift-node.service; enabled; vendor preset: disabled)
Active: activating (auto-restart) (Result: exit-code) since Mon 2018-04-09 02:20:36 EDT; 813ms ago
Process: 8433 ExecStopPost=/usr/bin/dbus-send --system --dest=uk.org.thekelleys.dnsmasq /uk/org/thekelleys/dnsmasq uk.org.thekelleys.SetDomainServers array:string: (code=exited, status=0/SUCCESS)
Process: 8431 ExecStopPost=/usr/bin/rm /etc/dnsmasq.d/node-dnsmasq.conf (code=exited, status=0/SUCCESS)
Process: 8419 ExecStop=/usr/bin/docker stop atomic-openshift-node (code=exited, status=1/FAILURE)
Process: 8405 ExecStartPost=/usr/bin/sleep 10 (code=exited, status=0/SUCCESS)
Process: 8404 ExecStart=/usr/bin/docker run --name atomic-openshift-node --rm --privileged --net=host --pid=host --env-file=/etc/sysconfig/atomic-openshift-node --entrypoint /usr/local/bin/openshift-node -v /:/rootfs:ro,rslave -e CONFIG_FILE=${CONFIG_FILE} -e OPTIONS=${OPTIONS} -e DEBUG_LOGLEVEL=${DEBUG_LOGLEVEL} -e HOST=/rootfs -e HOST_ETC=/host-etc -v /var/lib/origin:/var/lib/origin:rslave -v /etc/origin/node:/etc/origin/node -v /etc/localtime:/etc/localtime:ro -v /etc/machine-id:/etc/machine-id:ro -v /run:/run -v /sys:/sys:rw -v /sys/fs/cgroup:/sys/fs/cgroup:rw -v /usr/bin/docker:/usr/bin/docker:ro -v /var/lib/docker:/var/lib/docker -v /lib/modules:/lib/modules -v /etc/cni:/etc/cni:ro -v /opt/cni:/opt/cni:ro -v /etc/systemd/system:/host-etc/systemd/system -v /var/log:/var/log $NUAGE_ADDTL_BIND_MOUNTS -v /dev:/dev $DOCKER_ADDTL_BIND_MOUNTS -v /etc/pki:/etc/pki:ro -v /var/lib/origin/.docker:/root/.docker:ro registry.reg-aws.openshift.com:443/**openshift3/node:v3.10.0-0.14.0:${IMAGE_VERSION}** (code=exited, status=125)
Process: 8401 ExecStartPre=/usr/bin/dbus-send --system --dest=uk.org.thekelleys.dnsmasq /uk/org/thekelleys/dnsmasq uk.org.thekelleys.SetDomainServers array:string:/in-addr.arpa/127.0.0.1,/cluster.local/127.0.0.1 (code=exited, status=0/SUCCESS)
Process: 8398 ExecStartPre=/usr/bin/cp /etc/origin/node/node-dnsmasq.conf /etc/dnsmasq.d/ (code=exited, status=0/SUCCESS)
Process: 8384 ExecStartPre=/usr/bin/docker rm -f atomic-openshift-node (code=exited, status=1/FAILURE)
Main PID: 8404 (code=exited, status=125)
Apr 09 02:20:36 mynode systemd[1]: Failed to start atomic-openshift-node.service.
Apr 09 02:20:36 mynode systemd[1]: Unit atomic-openshift-node.service entered failed state.
Apr 09 02:20:36 mynode systemd[1]: atomic-openshift-node.service failed.
Problematic part is this:
-v /var/lib/origin/.docker:/root/.docker:ro registry.reg-aws.openshift.com:443/openshift3/node:v3.10.0-0.14.0:${IMAGE_VERSION}
After making this local change in openshift-ansible-3.10.0-0.14.0, i was able to get containerized cluster deployed: https://github.com/vikaschoudhary16/openshift-ansible/commit/c0825678c32e2294b3b383ec3793cb71a172e353#diff-330fdfd960ab6907fae5069bde42fcc3
Issues go stale after 90d of inactivity.
Mark the issue as fresh by commenting /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen
.
If this issue is safe to close now please do so with /close
.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh by commenting /remove-lifecycle rotten
.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen
.
If this issue is safe to close now please do so with /close
.
/lifecycle rotten /remove-lifecycle stale
Rotten issues close after 30d of inactivity.
Reopen the issue by commenting /reopen
.
Mark the issue as fresh by commenting /remove-lifecycle rotten
.
Exclude this issue from closing again by commenting /lifecycle frozen
.
/close
@openshift-bot: Closing this issue.
Description
Trying to install single node 3.10 cluster with pre-release version v3.10.0-0.14.0. Eventually openshift-ansible fails with error related to node label parsing.
It fails because oc tool that openshift-ansible produces is version 3.6. Why it is not 3.10?
Another confusing thing that i am not able to understand is the node image that openshift-ansible pulls to the host. Docker cmd shows image as "latest" but when i use "docker inspect", version it shows is v3.6.173.0.96, same as oc tool version.
same is the case with cli image.
Also when i tried to manually label the node, showing same error:
Version
UPDATE: Able to get pass this error by adding following in the inventory file:
openshift-ansible is not using
openshift_image_tag
, in the docker pull for openshift3/ose image. Two questions:openshift_image_tag
be used in the docker pull?latest
is used instead, whylatest
points to 3.6 and not to 3.10?/cc @smarterclayton @sdodson @sjenning