Closed raffaelespazzoli closed 6 years ago
Hey @raffaelespazzoli, is that the full error message? It looks like the error message has been truncated but ansible may have chopped it.
here is another snapshot of the log, this was run with -vv
, but there isn't much more additional detail:
PLAY [Create persistent volumes] ***********************************************
TASK [setup] *******************************************************************
ok: [master1.c.openshift-enablement-exam.internal]
TASK [openshift_facts : Detecting Operating System] ****************************
task path: /usr/share/ansible/openshift-ansible/roles/openshift_facts/tasks/main.yml:2
fatal: [master1.c.openshift-enablement-exam.internal]: FAILED! => {"failed": true, "msg": "The conditional check 'persistent_volumes | length > 0 or persistent_volume_claims | length > 0' failed. The error was: '{{ hostvars[groups.oo_first_master.0] | oo_persistent_volumes(groups) }}: create_pv'"}
to retry, use: --limit @/usr/share/ansible/openshift-ansible/playbooks/byo/config.retry
PLAY RECAP *********************************************************************
infranode1.c.openshift-enablement-exam.internal : ok=134 changed=3 unreachable=0 failed=0
infranode2.c.openshift-enablement-exam.internal : ok=134 changed=3 unreachable=0 failed=0
localhost : ok=15 changed=9 unreachable=0 failed=0
master1.c.openshift-enablement-exam.internal : ok=388 changed=23 unreachable=0 failed=1
master2.c.openshift-enablement-exam.internal : ok=292 changed=14 unreachable=0 failed=0
master3.c.openshift-enablement-exam.internal : ok=292 changed=14 unreachable=0 failed=0
node1.c.openshift-enablement-exam.internal : ok=134 changed=3 unreachable=0 failed=0
node2.c.openshift-enablement-exam.internal : ok=134 changed=3 unreachable=0 failed=0
node3.c.openshift-enablement-exam.internal : ok=134 changed=3 unreachable=0 failed=0
ose-bastion.c.openshift-enablement-exam.internal : ok=69 changed=1 unreachable=0 failed=0
To add more information, I've provisioned the environment in Google container platform following the reference architecture described here. I'm adhering to the availability zone scheme. I haven't provisioned the external loadbalancer for the master yet because I need the final certificates.
I'd expect a stack trace based on the failure. We pass hostvars into these filters to generate a list of volumes and claims to make and the failure is occurring in oo_persistent_volumes. I wonder if we'd get the stack trace with more verbosity.
I rerun the installer with -vvvv
, below is what I get:
PLAY [Create persistent volumes] ***********************************************
TASK [setup] *******************************************************************
Using module file /usr/lib/python2.7/site-packages/ansible/modules/core/system/setup.py
<master1.c.openshift-enablement-exam.internal> ESTABLISH SSH CONNECTION FOR USER: rspazzol
<master1.c.openshift-enablement-exam.internal> SSH: EXEC ssh -vvv -C -o ControlMaster=auto -o ControlPersist=60s -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=rspazzol -o ConnectTimeout=10 -o ControlPath=/home/rspazzol/.ansible/cp/ansible-ssh-%h-%p-%r master1.c.openshift-enablement-exam.internal '/bin/sh -c '"'"'( umask 77 && mkdir -p "` echo $HOME/.ansible/tmp/ansible-tmp-1475713625.09-251520290060066 `" && echo ansible-tmp-1475713625.09-251520290060066="` echo $HOME/.ansible/tmp/ansible-tmp-1475713625.09-251520290060066 `" ) && sleep 0'"'"''
<master1.c.openshift-enablement-exam.internal> PUT /tmp/tmp04o5OC TO /home/rspazzol/.ansible/tmp/ansible-tmp-1475713625.09-251520290060066/setup.py
<master1.c.openshift-enablement-exam.internal> SSH: EXEC sftp -b - -vvv -C -o ControlMaster=auto -o ControlPersist=60s -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=rspazzol -o ConnectTimeout=10 -o ControlPath=/home/rspazzol/.ansible/cp/ansible-ssh-%h-%p-%r '[master1.c.openshift-enablement-exam.internal]'
<master1.c.openshift-enablement-exam.internal> ESTABLISH SSH CONNECTION FOR USER: rspazzol
<master1.c.openshift-enablement-exam.internal> SSH: EXEC ssh -vvv -C -o ControlMaster=auto -o ControlPersist=60s -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=rspazzol -o ConnectTimeout=10 -o ControlPath=/home/rspazzol/.ansible/cp/ansible-ssh-%h-%p-%r master1.c.openshift-enablement-exam.internal '/bin/sh -c '"'"'chmod u+x /home/rspazzol/.ansible/tmp/ansible-tmp-1475713625.09-251520290060066/ /home/rspazzol/.ansible/tmp/ansible-tmp-1475713625.09-251520290060066/setup.py && sleep 0'"'"''
<master1.c.openshift-enablement-exam.internal> ESTABLISH SSH CONNECTION FOR USER: rspazzol
<master1.c.openshift-enablement-exam.internal> SSH: EXEC ssh -vvv -C -o ControlMaster=auto -o ControlPersist=60s -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=rspazzol -o ConnectTimeout=10 -o ControlPath=/home/rspazzol/.ansible/cp/ansible-ssh-%h-%p-%r -tt master1.c.openshift-enablement-exam.internal '/bin/sh -c '"'"'sudo -H -S -n -u root /bin/sh -c '"'"'"'"'"'"'"'"'echo BECOME-SUCCESS-ofmngrycrffxsqaxuendlbvoimsgwuqj; /usr/bin/python /home/rspazzol/.ansible/tmp/ansible-tmp-1475713625.09-251520290060066/setup.py; rm -rf "/home/rspazzol/.ansible/tmp/ansible-tmp-1475713625.09-251520290060066/" > /dev/null 2>&1'"'"'"'"'"'"'"'"' && sleep 0'"'"''
ok: [master1.c.openshift-enablement-exam.internal]
TASK [openshift_facts : Detecting Operating System] ****************************
task path: /usr/share/ansible/openshift-ansible/roles/openshift_facts/tasks/main.yml:2
fatal: [master1.c.openshift-enablement-exam.internal]: FAILED! => {
"failed": true,
"msg": "The conditional check 'persistent_volumes | length > 0 or persistent_volume_claims | length > 0' failed. The error was: '{{ hostvars[groups.oo_first_master.0] | oo_persistent_volumes(groups) }}: create_pv'"
}
to retry, use: --limit @/usr/share/ansible/openshift-ansible/playbooks/byo/config.retry
PLAY RECAP *********************************************************************
infranode1.c.openshift-enablement-exam.internal : ok=134 changed=2 unreachable=0 failed=0
infranode2.c.openshift-enablement-exam.internal : ok=134 changed=2 unreachable=0 failed=0
localhost : ok=15 changed=9 unreachable=0 failed=0
master1.c.openshift-enablement-exam.internal : ok=388 changed=22 unreachable=0 failed=1
master2.c.openshift-enablement-exam.internal : ok=292 changed=13 unreachable=0 failed=0
master3.c.openshift-enablement-exam.internal : ok=292 changed=14 unreachable=0 failed=0
node1.c.openshift-enablement-exam.internal : ok=134 changed=2 unreachable=0 failed=0
node2.c.openshift-enablement-exam.internal : ok=134 changed=2 unreachable=0 failed=0
node3.c.openshift-enablement-exam.internal : ok=134 changed=2 unreachable=0 failed=0
Instructions on how to reproduce the issue can be found here
https://github.com/raffaelespazzoli/openshift-enablement-exam
one update: the installation was successful with the latest playbooks from https://github.com/openshift/openshift-ansible HEAD. I'd say this issue is worth investigating.
an update. I'm now getting this issue at a customer. We are installing OpenShift on VMWare machines, so the issue is not google cloud related. Also my workaround of using the upstream ansible installer does not work anymore because now the logging and metrics deployers have been bumped to 3.3.1 and those images don't exist in registry.access.redhat.com yet.
I need urgently a workaround on this.
in attachement the customer's hosts file hosts.txt
When you add 'nfs' in the '[OSEv3:children]' section and '[nfs]' with the first master is then the error gone? We had the same problem and after this at least the registry problem was gone.
BTW: Please can you remove the next time the commented lines out this makes the hostsfile much more readable
@cw-aleks ,
I'm not sure I understand what you mean. Can you explain this better? "When you add 'nfs' in the '[OSEv3:children]' section and '[nfs]' with the first master is then the error gone?"
nfs with the first master? what does it mean? for us the first master is not an nfs server.... please make an example of what you mean.
I mean.
[OSEv3:children]
masters
nodes
etcd
lb
nfs
... other data.
[nfs]
master1
@raffaelespazzoli You can set openshift_hosted_logging_deployer_version
and openshift_hosted_metrics_deployer_version
to '3.3.0' as a workaround.
@sdodson thanks for the suggestion, we tried that yesterday, it worked for metrics but nor for logging and the ansible installer continues of the metrics are not successfully deployed, but fails if logging is not successfully deployed....(not sure why we have this difference).
@sdodson actually we tried this: openshift_hosted_logging_image_version
and openshift_hosted_metrics_image_version
. I'll give it a try with the other two attributes that you mention.
Aleks,
I'm not sure I understand what you mean. Can you explain this better? "When you add 'nfs' in the '[OSEv3:children]' section and '[nfs]' with the first master is then the error gone?"
nfs with the first master? what does it mean? for us the first master is not an nfs server....
2016-10-25 23:06 GMT-07:00 Aleks notifications@github.com:
When you add 'nfs' in the '[OSEv3:children]' section and '[nfs]' with the first master is then the error gone? We had the same problem and after this at least the registry problem was gone.
BTW: Please can you remove the next time the commented lines out this makes the hostsfile much more readable
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/openshift/openshift-ansible/issues/2553#issuecomment-256257693, or mute the thread https://github.com/notifications/unsubscribe-auth/AF5I3BijEnlfGRhnRqRbrg-NTPJJVSH9ks5q3u4BgaJpZM4KPF6a .
ciao/bye Raffaele
@raffaelespazzoli I refer to the pv/pvc error not to the image version error.
I had the same error at ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/byo/config.yml
.
After I have added the first master as a nfs server the setup was finished.
I have then changed the pv for the registry to the remote nfs server, something like this. https://docs.openshift.com/container-platform/3.3/install_config/registry/deploy_registry_existing_clusters.html#storage-for-the-registry
I'm not sure if this is the best solution but we must finishing the setup so it was a working solution.
an update: @sdodson 's workaround about using openshift_hosted_logging_deployer_version
and openshift_hosted_metrics_deployer_version
worked. Thanks.
it still does not look that when we are at a customer we cannot use the stock ansible installer from the rpms.
@raffaelespazzoli Pllease ensure that if you're using a github checkout that you remove all openshift-ansible RPMs (yum remove -y openshift-ansible\*
) and vice versa, if you're using the rpm versions make sure you're not running ansible from within a github checkout directory.
had similar issue with v1.5.0 origin and switching to v.3.6.1 solved it, but unsure what was the issue
The installer fails with the following output:
Version
atomic-openshift-utils-3.3.28-1.git.0.762256b.el7.noarch
openshift-ansible-3.3.28-1.git.0.762256b.el7.noarch
ansible 2.2.0
all the nodes and the ansible host have the following:Linux ose-bastion 3.10.0-327.36.1.el7.x86_64 #1 SMP Wed Aug 17 03:02:37 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux
I'm installing on Google Cloud Platform
Steps To Reproduce
ansible-playbook -v -i hosts /usr/share/ansible/openshift-ansible/playbooks/byo/config.yml
the host file is provided below
Current Result
the above error
Expected Result
complete installation
Additional Information