sborenst / ansible_aws_deployer

20 stars 20 forks source link

Installation just dies at Cache Java dependencies #12

Closed kenthua closed 8 years ago

kenthua commented 8 years ago

PLAY [Cache Java dependencies] ***** to retry, use: --limit @ansible/bu-workshop.retry

http://pastebin.com/qgQZaRJ3

thoraxe commented 8 years ago

You have numerous unreachable hosts:

ip-192-199-0-114.us-west-1.compute.internal : ok=2    changed=2    unreachable=1    failed=0
ip-192-199-0-115.us-west-1.compute.internal : ok=14   changed=7    unreachable=0    failed=0
ip-192-199-0-116.us-west-1.compute.internal : ok=14   changed=7    unreachable=0    failed=0
ip-192-199-0-117.us-west-1.compute.internal : ok=14   changed=7    unreachable=0    failed=0
ip-192-199-0-118.us-west-1.compute.internal : ok=14   changed=7    unreachable=0    failed=0
ip-192-199-0-119.us-west-1.compute.internal : ok=2    changed=2    unreachable=1    failed=0
ip-192-199-0-157.us-west-1.compute.internal : ok=2    changed=2    unreachable=1    failed=0
ip-192-199-0-203.us-west-1.compute.internal : ok=2    changed=2    unreachable=1    failed=0
ip-192-199-0-31.us-west-1.compute.internal : ok=2    changed=2    unreachable=1    failed=0
ip-192-199-0-69.us-west-1.compute.internal : ok=2    changed=2    unreachable=1    failed=0

I would assume it is related.

kenthua commented 8 years ago

I would expect when a node within a tasks fails that it would fail the entire playbook for tasks without ignore_errors: true. In this case it seems to fail the node, then on subsequent tasks stop calling the failed nodes which is very odd behavior.

I seem to get further with less errors when I commented out a few entries in the included ansible.cfg.

The forks=15 seems to cause issues with node connectivity for me. I'm not sure but the following below seem to relate to the no more execution on failed nodes. Likely fact_caching, but I'm not sure. gathering = smart fact_caching = jsonfile fact_caching_connection = .ansible_facts

Lastly, I've experienced it before with subscription manager type calls and doing them in parallel causes weird behavior (i.e. failed subscription-manager commands, timed out tasks). So I added serial: 3 to https://github.com/sborenst/ansible_aws_deployer/blob/master/ansible/bu-workshop.yml#L65-L73 The reference architecture and demo_ansible provision the nodes serially one by one. I do the same for my own deploys as well.

kenthua commented 8 years ago

As I've gotten further, lucky me because OCP 3.3.1.3 is out =), so now I need to edit the vars file. Will have to try a run again another day.

"Detected OpenShift version 3.3.1.3 does not match requested openshift_release 3.3.0.35"

thoraxe commented 8 years ago

https://github.com/sborenst/ansible_aws_deployer/issues/6

Known issue. Haven't had time to dig into it. It has to do with whenever something for OpenShift is installed earlier to running the installer, but I'm not exactly sure what.

thoraxe commented 8 years ago

I'm going to close this one as I think it was really a different issue in disguise.