osism / issues

This repository is used for bug reports that are cross-project or not bound to a specific repository (or to an unknown repository).
https://www.osism.tech
1 stars 1 forks source link

kolla-ansible expects octavia hm interface to be present on control nodes #508

Open Nils98Ar opened 1 year ago

Nils98Ar commented 1 year ago

For some reason kolla-ansible thinks that the health-manager interface should be present on control nodes although they should not be members of the octavia-health-manager group.

TASK [octavia : Copying over octavia.conf] ****************************************************************************************************************************************************************************************************************************************************************************************************************************
fatal: [control1]: FAILED! => {"msg": "An unhandled exception occurred while templating '{{ 'octavia_network' | kolla_address }}'. Error was a <class 'kolla_ansible.exception.FilterError'>, original message: Interface 'o-hm0' not present on host 'control1'"}
fatal: [control2]: FAILED! => {"msg": "An unhandled exception occurred while templating '{{ 'octavia_network' | kolla_address }}'. Error was a <class 'kolla_ansible.exception.FilterError'>, original message: Interface 'o-hm0' not present on host 'control2'"}
fatal: [control3]: FAILED! => {"msg": "An unhandled exception occurred while templating '{{ 'octavia_network' | kolla_address }}'. Error was a <class 'kolla_ansible.exception.FilterError'>, original message: Interface 'o-hm0' not present on host 'control3'"}

According to this only the network nodes are members of the octavia-health-manager group. I think this line matches the control nodes which it shouldn‘t.

I've alreay tried osism reconciler sync and osism apply facts. Do you have any idea how to fix it?

berendt commented 1 year ago

We solved this by placing the Octavia API service on the network nodes as well. Works for us pretty good in production. Is this a possible workaround for your environment?

Add the following entry to the 99-overwrite file in inventory (replace netX with your network node names), run osism reconciler sync afterwards.

[octavia]
net001
net002
net003
Nils98Ar commented 1 year ago

@berendt Thank you! I will check this tomorrow. It seems functional for now despite the error.

Nils98Ar commented 1 year ago

@berendt Works! Where does the issue come from, Upstream or OSISM? Is there already an open issue somewhere?

And another question: are you using octavia with tenant management network for production?

mohaa7 commented 1 year ago

I also get the same error:

fatal: [control01]: FAILED! => {"msg": "An unhandled exception occurred while templating '{{ 'octavia_network' | kolla_address }}'. Error was a <class 'kolla_ansible.exception.FilterError'>, original message: Interface 'o-hm0' not present on host 'control01'"}
fatal: [control02]: FAILED! => {"msg": "An unhandled exception occurred while templating '{{ 'octavia_network' | kolla_address }}'. Error was a <class 'kolla_ansible.exception.FilterError'>, original message: Interface 'o-hm0' not present on host 'control02'"}
fatal: [control03]: FAILED! => {"msg": "An unhandled exception occurred while templating '{{ 'octavia_network' | kolla_address }}'. Error was a <class 'kolla_ansible.exception.FilterError'>, original message: Interface 'o-hm0' not present on host 'control03'"}

How to solve it?


Here's the related sections of the inventory :

[control]
control[01:03] ansible_user=deploy ansible_become=true

[network:children]
control

[octavia:children]
control

# Octavia
[octavia-api:children]
octavia

[octavia-driver-agent:children]
octavia

[octavia-health-manager:children]
octavia

[octavia-housekeeping:children]
octavia

[octavia-worker:children]
octavia

Note: Network nodes are the control nodes.

And the globals.yml file, grep -vE "^#|^$" /etc/kolla/globals.yml | grep -E "ovn|octavia":

# OCTAVIA
enable_octavia: "yes"
octavia_network_type: "tenant"
octavia_provider_drivers: "ovn:OVN provider"
octavia_provider_agents: "ovn"

#OVN
neutron_plugin_agent: "ovn"
enable_ovn: "{{ enable_neutron | bool and neutron_plugin_agent == 'ovn' }}"
neutron_ovn_distributed_fip: "yes"
neutron_ovn_dhcp_agent: "yes"

Version: Zed

Nils98Ar commented 1 year ago

For us it worked after moving the octavia-api services to the network nodes on zed.

I have no experience with controller + network on the same nodes or ovn environments.

mohaa7 commented 1 year ago

(For the OVN-enabled environment, it's recommended to keep both network and controller stuff on the same hosts)

I think the issue is not related to where network nodes are located. I could deploy Octavia by Open vSwitch! Probably there's something missed related to OVN in the above configuration!

berendt commented 1 year ago

Looks like this is a race condition.

group_vars/all.yml:octavia_auto_configure: "{{ 'amphora' in octavia_provider_drivers }}"

--> octavia_auto_configure = False

group_vars/all.yml: octavia_network_interface: "{{ 'o-hm0' if octavia_network_type == 'tenant' else api_interface }}"

--> octavia_network_interface = o-hm0

- include_tasks: hm-interface.yml
  when:
    - octavia_auto_configure | bool
    - octavia_network_type == "tenant"
    - inventory_hostname in groups[octavia_services['octavia-health-manager']['group']]

--> Skipped because octavia_auto_configure =False --> o-hm0 network interface missing

Can you please try to set octavia_auto_configure: "yes" in environments/kolla/configuration.yml and to re-deploy Octavia service.

mohaa7 commented 1 year ago

Can you please try to set octavia_auto_configure: "yes" in environments/kolla/configuration.yml and to re-deploy Octavia service.

I set it and it deployed successfully. Thanks. As the Kolla documentation is denoting that octavia_network_type: "tenant" is a simple way to setup Octavia networking for development or testing and may not be reliable enough for production, then by removing it from the /etc/kolla/globals.yml, the config you mentioned, octavia_auto_configure: "yes", won't be necessary, right?

Nils98Ar commented 1 year ago

We solved this by placing the Octavia API service on the network nodes as well. Works for us pretty good in production. Is this a possible workaround for your environment?

Add the following entry to the 99-overwrite file in inventory (replace netX with your network node names), run osism reconciler sync afterwards.

[octavia]
net001
net002
net003

@berendt Is this workaround still needed?

berendt commented 1 year ago

Yes, at the moment we assume that all Octavia services are running on the network nodes (if there are any).

artificial-intelligence commented 3 months ago

(For the OVN-enabled environment, it's recommended to keep both network and controller stuff on the same hosts)

I think the issue is not related to where network nodes are located. I could deploy Octavia by Open vSwitch! Probably there's something missed related to OVN in the above configuration!

Notice that the docs you are quoting are referring to devstack and in general are developer docs, meaning their intended audience are openstack developers.

I would not consult these docs, or at least treat them with quite some caution, when building or consulting on production cloud environments.