Closed kenthua closed 8 years ago
You have numerous unreachable hosts:
ip-192-199-0-114.us-west-1.compute.internal : ok=2 changed=2 unreachable=1 failed=0
ip-192-199-0-115.us-west-1.compute.internal : ok=14 changed=7 unreachable=0 failed=0
ip-192-199-0-116.us-west-1.compute.internal : ok=14 changed=7 unreachable=0 failed=0
ip-192-199-0-117.us-west-1.compute.internal : ok=14 changed=7 unreachable=0 failed=0
ip-192-199-0-118.us-west-1.compute.internal : ok=14 changed=7 unreachable=0 failed=0
ip-192-199-0-119.us-west-1.compute.internal : ok=2 changed=2 unreachable=1 failed=0
ip-192-199-0-157.us-west-1.compute.internal : ok=2 changed=2 unreachable=1 failed=0
ip-192-199-0-203.us-west-1.compute.internal : ok=2 changed=2 unreachable=1 failed=0
ip-192-199-0-31.us-west-1.compute.internal : ok=2 changed=2 unreachable=1 failed=0
ip-192-199-0-69.us-west-1.compute.internal : ok=2 changed=2 unreachable=1 failed=0
I would assume it is related.
I would expect when a node within a tasks fails that it would fail the entire playbook for tasks without ignore_errors: true
. In this case it seems to fail the node, then on subsequent tasks stop calling the failed nodes which is very odd behavior.
I seem to get further with less errors when I commented out a few entries in the included ansible.cfg.
The forks=15
seems to cause issues with node connectivity for me.
I'm not sure but the following below seem to relate to the no more execution on failed nodes. Likely fact_caching, but I'm not sure.
gathering = smart
fact_caching = jsonfile
fact_caching_connection = .ansible_facts
Lastly, I've experienced it before with subscription manager type calls and doing them in parallel causes weird behavior (i.e. failed subscription-manager commands, timed out tasks). So I added serial: 3
to https://github.com/sborenst/ansible_aws_deployer/blob/master/ansible/bu-workshop.yml#L65-L73 The reference architecture and demo_ansible provision the nodes serially one by one. I do the same for my own deploys as well.
As I've gotten further, lucky me because OCP 3.3.1.3 is out =), so now I need to edit the vars file. Will have to try a run again another day.
"Detected OpenShift version 3.3.1.3 does not match requested openshift_release 3.3.0.35"
https://github.com/sborenst/ansible_aws_deployer/issues/6
Known issue. Haven't had time to dig into it. It has to do with whenever something for OpenShift is installed earlier to running the installer, but I'm not exactly sure what.
I'm going to close this one as I think it was really a different issue in disguise.
PLAY [Cache Java dependencies] ***** to retry, use: --limit @ansible/bu-workshop.retry
http://pastebin.com/qgQZaRJ3