michaelrigart / ansible-role-interfaces

An ansible role for configuring different network interfaces
GNU General Public License v3.0
83 stars 61 forks source link

When configuring an IP over IB (`ipoib`) interface failing with "Interface ib0 is not active" #124

Open Aethylred opened 2 years ago

Aethylred commented 2 years ago

I'm trying to set up an InfiniBand interface on a Mellanox ConnectX-6 with OFED driver version 5.5- on Rocky 8.5

Drivers are installed and interfaces can be brought up manually.

I'm calling the role like this because the role has already been called earlier to set up the real Ethernet interfaces:

- name: Configure Infiniband interfaces
  hosts: infiniband

    - name: Configure Infinband interfaces
        name: michaelrigart.interfaces
        interfaces_pause_time: 120
          - device: "{{ infiniband_interface }}"
            bootproto: static
            address: "{{ ib_ip }}"
            netmask: "{{ infiniband_netmask }}"
            type: ipoib
      become: true

I've added interfaces_pause_time: 120 as I assumed that the interfaces were just taking time to become active after being bounced, I'

However when executing the playbook they end with:

RUNNING HANDLER [michaelrigart.interfaces : Check active Ethernet interface state] *********************************************
failed: [ib-host11] (item={'device': 'ib0', 'bootproto': 'static', 'address': '', 'netmask': '', 'type': 'ipoib'}) => {"ansible_loop_var": "item", "changed": false, "item": {"address": "", "bootproto": "static", "device": "ib0", "netmask": "", "type": "ipoib"}, "msg": "Interface ib0 is not active"}

I've check for other issues for ipoib and #76 and #58 look like they've been resolved, and don't seem to help resolve this issue.

markgoddard commented 2 years ago

Hi @Aethylred. You can see where that error is generated here. It means that the Ansible fact for the interface has marked it as not active.

You could check the actual interface status, to see if it is up. You could also check the generated ifcfg file, to see if it is as you would expect.

Aethylred commented 2 years ago

After the playbook fails, logging into the host the ifcfg-ib0 looks good and ifup ib0 works.

Aethylred commented 2 years ago

If I extend the interface pause to interfaces_pause_time: 300 then it succeeds.

I think there may be a delay while the interface and our subnet manager sort themselves out.

markgoddard commented 2 years ago

Interesting. Is there anything we need to change here?

Aethylred commented 2 years ago

Not sure, I think it would be better if it could poll for the interface being 'ready' or 'active' rather than refreshing the facts to get the interface state.

Ideally with a retry limit and a timeout.