openshift / openshift-ansible

Install and config an OpenShift 3.x cluster
https://try.openshift.com
Apache License 2.0
2.17k stars 2.31k forks source link

ansible deploy fail when specifying vsphere parameters in the inventory file #6289

Closed menardorama closed 6 years ago

menardorama commented 6 years ago

Description

I am trying to deploy openshift hosted on a esxi cluster, I have provisionned centos 7 VM with all the prerequists. I use the playbook found in playbooks/byo/config.yml and tried different configurations.

Using Container Native storage is working fine, I can even deploy registry, logging and metrics on a dedicated nfs share. The install complete successfully and openshift is working as expected.

Now as we are hosting the openshift platform I try tyo use the vsphere storage provider. Defining on an existing cluster is working as expected.

But the remaining issue I find is when I want to define this storage provider in the inventory in order to get it at the installation level.

Masters are correctly deployed but the nodes fails to start, it remains an issue the the SDN controller.

The deployement of course fail and I can see this output on journald on the nodes

sdn_controller.go:38] Could not find an allocated subnet for node: os-node1.mydomain, Waiting.
Version
ansible 2.3.2.0
  config file = /home/tmenard/openshift-ansible/ansible.cfg
  configured module search path = Default w/o overrides
  python version = 2.7.5 (default, Aug  4 2017, 00:39:18) [GCC 4.8.5 20150623 (Red Hat 4.8.5-16)]

openshift-ansible-3.6.173.0.60-1
Steps To Reproduce
  1. Create an inventory file :
    [OSEv3:children]
    masters
    nodes
    etcd
    lb
    [OSEv3:vars]
    openshift_enable_unsupported_configurations=True
    openshift_disable_check=docker_storage
    ansible_ssh_user=tmenard
    ansible_become=yes
    debug_level=2
    openshift_deployment_type=origin
    openshift_release=v3.6
    openshift_install_examples=true
    openshift_docker_additional_registries=registry-gitlab.mycompany.fr
    openshift_docker_disable_push_dockerhub=False
    openshift_docker_selinux_enabled=False
    openshift_master_identity_providers=[{'name': 'htpasswd_auth', 'login': 'true', 'challenge': 'true', 'kind': 'HTPasswdPasswordIdentityProvider', 'filename': '/etc/origin/master/htpasswd'}]
    openshift_master_htpasswd_users={'admin': '$apr1$cN8IDFm5$HV7URAAJIQHWyuGnhHpYv0'}
    openshift_master_identity_providers=[{'name': 'MyCompany', 'challenge': 'true', 'login': 'true', 'kind': 'LDAPPasswordIdentityProvider', 'attributes': {'id': ['dn'], 'email': ['mail'], 'name': ['cn'], 'preferredUsername': ['uid']}, 'bindDN': '', 'bindPassword': '', 'insecure': 'false', 'url': 'ldaps://ldap.mycompany.fr:636/ou=utilisateurs,dc=mycompany,dc=fr?uid'}]
    openshift_cfme_install_app=True
    osm_use_cockpit=true
    osm_cockpit_plugins=['cockpit-kubernetes']
    openshift_master_cluster_method=native
    openshift_master_cluster_hostname=smartcloud.it.mycompany.fr
    openshift_master_cluster_public_hostname=openshift.it.mycompany.fr
    osm_controller_args={'cloud-provider': ['vsphere'], 'cloud-config': ['/etc/vsphere/vsphere.conf'] }
    osm_api_server_args={'cloud-provider': ['vsphere'], 'cloud-config': ['/etc/vsphere/vsphere.conf'] }
    openshift_master_default_subdomain=container.it.mycompany.fr
    openshift_hosted_router_selector='region=infra'
    openshift_hosted_registry_selector='region=infra'
    openshift_hosted_registry_storage_kind=nfs
    openshift_hosted_registry_storage_access_modes=['ReadWriteMany']
    openshift_hosted_registry_storage_host=atlas.mycompany.fr
    openshift_hosted_registry_storage_nfs_directory=/ifs/mycompany/services/policy01/cloud/
    openshift_hosted_registry_storage_volume_name=registry
    openshift_hosted_registry_storage_volume_size=100Gi
    openshift_hosted_metrics_deploy=true
    openshift_hosted_metrics_storage_kind=nfs
    openshift_hosted_metrics_storage_access_modes=['ReadWriteOnce']
    openshift_hosted_metrics_storage_host=atlas.mycompany.fr
    openshift_hosted_metrics_storage_nfs_directory=/ifs/mycompany/services/policy01/cloud/
    openshift_hosted_metrics_storage_volume_name=metrics
    openshift_hosted_metrics_storage_volume_size=100Gi
    Openshift_hosted_metrics_storage_labels={'storage': 'metrics'}
    openshift_hosted_metrics_public_url=https://hawkular-metrics.it.mycompany.fr/hawkular/metrics
    openshift_hosted_logging_storage_kind=nfs
    openshift_hosted_logging_storage_access_modes=['ReadWriteOnce']
    openshift_hosted_logging_storage_host=atlas.mycompany.fr
    openshift_hosted_logging_storage_nfs_directory=/ifs/mycompany/services/policy01/cloud/
    openshift_hosted_logging_storage_volume_name=logging
    openshift_hosted_logging_storage_volume_size=100Gi
    openshift_hosted_logging_storage_labels={'storage': 'logging'}
    os_sdn_network_plugin_name='redhat/openshift-ovs-multitenant'
    openshift_master_overwrite_named_certificates=true
    openshift_master_named_certificates=[{"certfile": "/home/tmenard/certs/openshift_it_mycompany_fr.crt", "keyfile": "/home/tmenard/certs/openshift_it_mycompany_fr.key"}]
    openshift_node_kubelet_args={'cloud-provider': ['vsphere'], 'cloud-config': ['/etc/vsphere/vsphere.conf'], 'image-gc-high-threshold': ['85'], 'image-gc-low-threshold': ['80'], 'max-pods': ['250'], 'pods-per-core': ['10']}
    openshift_template_service_broker_namespaces=['openshift']
    openshift_clock_enabled=true
    [masters]
    os-master[1:3].it.mycompany.fr
    [etcd]
    os-master[1:3].it.mycompany.fr
    [masters]
    os-master[1:3].it.mycompany.fr
    [etcd]
    os-master[1:3].it.mycompany.fr
    [lb]
    os-lb-front.it.mycompany.fr containerized=false
    [nodes]
    os-master[1:3].it.mycompany.fr openshift_schedulable=False
    os-infra[1:3].it.mycompany.fr openshift_node_labels="{'region': 'infra', 'zone': 'default'}"
    os-node[1:3].it.mycompany.fr openshift_node_labels="{'region': 'primary', 'zone': 'default'}"
    
  2. Install the vsphere prerequisits as described https://docs.openshift.org/latest/install_config/configuring_vsphere.html
Expected Results

The installation process should complete correctly

Example command and output or error messages
Observed Results

The restart node task is failing

RUNNING HANDLER [openshift_node : restart node] *************************************************************************************************************************************************************************************************************************************************
FAILED - RETRYING: restart node (3 retries left).
FAILED - RETRYING: restart node (3 retries left).
FAILED - RETRYING: restart node (3 retries left).
FAILED - RETRYING: restart node (3 retries left).
FAILED - RETRYING: restart node (3 retries left).
FAILED - RETRYING: restart node (3 retries left).
FAILED - RETRYING: restart node (3 retries left).
FAILED - RETRYING: restart node (3 retries left).
FAILED - RETRYING: restart node (3 retries left).
FAILED - RETRYING: restart node (2 retries left).
FAILED - RETRYING: restart node (2 retries left).
FAILED - RETRYING: restart node (2 retries left).
FAILED - RETRYING: restart node (2 retries left).
FAILED - RETRYING: restart node (2 retries left).
FAILED - RETRYING: restart node (2 retries left).
FAILED - RETRYING: restart node (2 retries left).
FAILED - RETRYING: restart node (2 retries left).
FAILED - RETRYING: restart node (2 retries left).
FAILED - RETRYING: restart node (1 retries left).
FAILED - RETRYING: restart node (1 retries left).
fatal: [os-infra3.it.mycompany.fr]: FAILED! => {
    "attempts": 3,
    "changed": false,
    "failed": true
}

MSG:

Unable to restart service origin-node: Job for origin-node.service failed because the control process exited with error code. See "systemctl status origin-node.service" and "journalctl -xe" for details.

fatal: [os-infra2.it.mycompany.fr]: FAILED! => {
    "attempts": 3,
    "changed": false,
    "failed": true
}

MSG:

Unable to restart service origin-node: Job for origin-node.service failed because the control process exited with error code. See "systemctl status origin-node.service" and "journalctl -xe" for details.

journald log : https://gist.github.com/menardorama/202d78f0a03a61e882e5fa079294724a

Additional Information
menardorama commented 6 years ago

Additional information

[root@os-master1 tmenard]#  oc get clusternetwork default -o yaml
apiVersion: v1
hostsubnetlength: 9
kind: ClusterNetwork
metadata:
  creationTimestamp: 2017-11-28T09:42:38Z
  name: default
  resourceVersion: "310"
  selfLink: /oapi/v1/clusternetworks/default
  uid: 787683b7-d420-11e7-aa55-0050568a2c6e
network: 10.128.0.0/14
pluginName: redhat/openshift-ovs-multitenant
serviceNetwork: 172.30.0.0/16
menardorama commented 6 years ago

Does anyone knows if setting up vsphere directly at the install is at least supported ?

menardorama commented 6 years ago

OK I reply to myself....

So the issue is related to the to_padded_yaml filter function in charge of converting the variables to a yaml syntax.

Defining the osm_controller_args (or osm_api_server_args or openshift_node_kubelet_args) in the inventory host file doesn't work when using double-quotes.

Even trying to escape them is not working.

The workarround is to define those variables in a group vars using yaml syntax (instead of INI syntax)

So it looks like that :

osm_controller_args:
      cloud-provider:
        - \"vsphere\"
      cloud-config:
        - \"/etc/vsphere/vsphere.conf\"

osm_api_server_args:
      cloud-provider:
        - \"vsphere\"
      cloud-config:
        - \"/etc/vsphere/vsphere.conf\"

openshift_node_kubelet_args:
      cloud-provider:
        - \"vsphere\"
      cloud-config:
        - \"/etc/vsphere/vsphere.conf\"
      image-gc-high-threshold:
        - '85'
      image-gc-low-threshold:
        - '80'
      max-pods:
        - '250'
      pods-per-core:
        - '10'

But unfortunately due to the poor vsphere implementation in openshift 3.6, I am hitting a bug where NodeIP conflict with cluster network.... https://bugzilla.redhat.com/show_bug.cgi?id=1433236

So my conclusion is that you can't deploy openshift 3.6 with a vsphere configuration, this have to be in two phases.

This come to my second question, why is there a reference architecture for vmware that is not working ? https://github.com/openshift/openshift-ansible-contrib/tree/master/reference-architecture/vmware-ansible

Maybe I'll be more lucky in 3.7