nuagenetworks / nuage-metroae

Nuage Networks Metro Automation Engine
http://devops.nuagenetworks.net
Apache License 2.0
44 stars 17 forks source link

VSD Deployment Fails -- Static Network Configuration is missing. 5.3.3 release #985

Closed apooniajjn closed 5 years ago

apooniajjn commented 5 years ago

It looks like VSD deployment is failing on Nuage 5.3.3 release. Out nuage vsd qcow2 has static hardware address present in /etc/sysconfig/network-scripts/ifcfg-eth0 file so when VSD predeploy runs it doesn't apply static configuration on VSD qcow2. So as soon as VSD deploy tasks run they fail to connect to VSD host.

Can you suggest a workaround or how it can be solved ?

Release : 5.3.3 Hypervisor: KVM Playbook: VSD Deploy Task: Wait for SSH to be ready

Reason: Since static config is missing from eth0 definition on VSD it fails to connect over SSH.

ghost commented 5 years ago

Are you saying that you already have the static network config in the qcow2 before predeploy runs? And your theory is that predeploy is failing to write a new ifcfg-eth0 because of it?

ghost commented 5 years ago

I'm trying to fully understand your use case. When vsd_predeploy runs, we use guestfish copy-in to replace the ifcfg-eth0 file. The static ip config should be replaced with the ip address, netmask, and gateway that are specified in the user data. Are you skipping predeploy? It would be good to see copies of your user input (build_vars.yml for MetroAE 2 or your common.yml and vsds.yml for MetroAE 3) and the ansible.log file.

apooniajjn commented 5 years ago

@bacastelli Sure, please find below detailed information.

Use Case:

Installing Nuage VSD on KVM or ESXI with the help of Metro. Release Used

Explanation

I looked at vsd predeploy role and indeed its writing static ip configuration on qcow2 used for target server using guestfish. But it looks it's not happening for Nuage Networks 5.3.3 release. So as soon as predeploy playbook gets completed and deploy playbooks starts it looks for an ssh connection on assigned static IP, but when I connect to VSD VM using virsh console or vcenter console. I don't see static config applied on /etc/sysconfig/network-scripts/ifcfg-eth0 file.

On 5.3.3 release VSD qcow2 has some weird static hardware address in /etc/sysconfig/network-scripts/ifcfg-eth0.

I didn't this behavior on any earlier release which I tested.

Note: I am able to install VSC, ElasticSearch VMs and Utility VM perfectly for Nuage VSP 5.3.3 release using metro.

Let me know if my issue statement is not clear.

ghost commented 5 years ago

First of all, MetroAE 2.3.1 is very old. At a minimum, you should upgrade to v2.4.6. Better yet, migrate to the new deployment based MetroAE, v3.1.0.

Note that to get v2.4.6 you will need to do a git pull followed by a git checkout metroe2.

Changing to MetroAE v3 will require you to change from build_vars.yml to the new deployment model. In the src directory we provide a script that will convert your existing build_vars.yml into a deployment to make transition easier.

Bottom line: we have not tested installing 5.3.3 with Metroae v2. I will try to help you as much as I can, but I must ask that you upgrade to one of the newer versions.

apooniajjn commented 5 years ago

@bacastelli Sure, I forgot to mention that I had looked 3.1.0 and earlier version for vsd-predeploy role. I did compare network configuration for role vsd-predeploy in kvm.yml playbook but there is not much difference here.

I am sure you are already aware of it but a short snippet is as per below:

Version 3.1.0

Network configuration tasks on vsd-predeploy role:

  - name: Create a temporary copy of the network script for eth0
    template: src=ifcfg-eth0.j2 backup=no dest={{ images_path }}/{{ vm_name }}/ifcfg-eth0

  - name: Copy eth0 network script file to the VSD image
    command: guestfish --rw -a {{ guestfish_dest }} -m {{ guestfish_mount }} copy-in {{ images_path }}/{{ vm_name }}/ifcfg-eth0 /etc/sysconfig/network-scripts/

  - name: Remove temporary copy of eth0 network script
    file: path={{ images_path }}/{{ vm_name }}/ifcfg-eth0 state=absent

  - name: Set the owner and group on the eth0 network script file in the VSD image
    command: guestfish --rw -a {{ guestfish_dest }} -m {{ guestfish_mount }} chown 0 0 /etc/sysconfig/network-scripts/ifcfg-eth0

  - name: Create a temporary copy of the syscfg network file
    template: src=network.j2 backup=no dest={{ images_path }}/{{ vm_name }}/network

  - name: Copy network file to the VSD image
    command: guestfish --rw -a {{ guestfish_dest }} -m {{ guestfish_mount }} copy-in {{ images_path }}/{{ vm_name }}/network /etc/sysconfig/

  - name: Remove temporary copy of network file
    file: path={{ images_path }}/{{ vm_name }}/network state=absent

  - name: Set the owner and group for the network hostname file on the VSD image
    command: guestfish --rw -a {{ guestfish_dest }} -m {{ guestfish_mount }} chown 0 0 /etc/sysconfig/network

Version 2.3.1

Network configuration tasks on vsd-predeploy role:

- name: Create a temporary copy of the network script for eth0
  template: src=ifcfg-eth0.j2 backup=no dest={{ images_path }}/{{ vm_name }}/ifcfg-eth0
  delegate_to: "{{ target_server }}"
  remote_user: "{{ target_server_username }}"

- name: Copy eth0 network script file to the VSD image
  command: guestfish --rw -a {{ images_path }}/{{ vm_name }}/{{ vsd_qcow2_file_name }} -m {{ guestfish_mount }} copy-in {{ images_path }}/{{ vm_name }}/ifcfg-eth0 /etc/sysconfig/network-scripts/
  delegate_to: "{{ target_server }}"
  remote_user: "{{ target_server_username }}"

- name: Remove temporary copy of eth0 network script
  file: path={{ images_path }}/{{ vm_name }}/ifcfg-eth0 state=absent
  delegate_to: "{{ target_server }}"
  remote_user: "{{ target_server_username }}"

- name: Set the owner and group on the eth0 network script file in the VSD image
  command: guestfish --rw -a {{ images_path }}/{{ vm_name }}/{{ vsd_qcow2_file_name }} -m {{ guestfish_mount }} chown 0 0 /etc/sysconfig/network-scripts/ifcfg-eth0
  delegate_to: "{{ target_server }}"
  remote_user: "{{ target_server_username }}"

- name: Create a temporary copy of the syscfg network file
  template: src=network.j2 backup=no dest={{ images_path }}/{{ vm_name }}/network
  delegate_to: "{{ target_server }}"
  remote_user: "{{ target_server_username }}"

- name: Copy network file to the VSD image
  command: guestfish --rw -a {{ images_path }}/{{ vm_name }}/{{ vsd_qcow2_file_name }} -m {{ guestfish_mount }} copy-in {{ images_path }}/{{ vm_name }}/network /etc/sysconfig/
  delegate_to: "{{ target_server }}"
  remote_user: "{{ target_server_username }}"

- name: Remove temporary copy of network file
  file: path={{ images_path }}/{{ vm_name }}/network state=absent
  delegate_to: "{{ target_server }}"
  remote_user: "{{ target_server_username }}"

- name: Set the owner and group for the network hostname file on the VSD image
  command: guestfish --rw -a {{ images_path }}/{{ vm_name }}/{{ vsd_qcow2_file_name }} -m {{ guestfish_mount }} chown 0 0 /etc/sysconfig/network
  delegate_to: "{{ target_server }}"
  remote_user: "{{ target_server_username }}"

As you can see there is not much difference here so ideally it should install on 2.3.1 release as well and assign a proper static ip configuration as per ifcfg-eth0 template.

But as you mentioned 2.3.1 is not tested for 5.3.3 release. So I will probably update it locally and try to install 5.3.3 VSD and see if fixes static ip configuration issue on VSD VM.

I will update issue accordingly, thanks for your help.

ghost commented 5 years ago

I just remembered that in a recent Nuage release cloud init was changed. I'm certain that MetroAE 2.3.1 does not have the proper cloud-init disable software for the cloud init that exists in Nuage 5.3.3. What is happening is that guestfish is faithfully copying the network file, but cloud init is clobbering that file when the VSD boots. A newer version of MetroAE will solve the problem.

ghost commented 5 years ago

Found the details: Starting in VSD 5.3.2, cloud init started overwriting ifcfg-eth0. Therefore, in MetroÆ 2.4.3 we introduced code to disable cloud init. You can either jump to v2.4.6 using the metroae2 branch (git checkout metroae2; git pull) or you can make the switch to MetroÆ v3.1.0 and the new deployments style. Your choice.

Closing this issue as resolved.