oomichi / try-kubernetes

12 stars 5 forks source link

Failed to create a new network on Neutron #77

Closed oomichi closed 5 years ago

oomichi commented 5 years ago

まとめ

Tenant network: lb-mgmt-net が作れない問題

$ openstack network create lb-mgmt-net
Error while executing command: HttpException: Unknown error, {"NeutronError": {"message": "Unable to create the network. No tenant network is available for allocation.", "type": "NoNetworkAvailable", "detail": ""}}

下記のようにTenant network 用として VXLAN を設定することで解決

$ diff -u ml2_conf.ini.orig ml2_conf.ini
--- ml2_conf.ini.orig   2019-03-06 10:54:51.062392771 -0800
+++ ml2_conf.ini        2019-03-06 12:08:06.510577001 -0800
@@ -1,9 +1,11 @@
 [ml2]
-type_drivers = flat,vlan
-tenant_network_types =
+type_drivers = flat,vxlan
+tenant_network_types = vxlan
 mechanism_drivers = linuxbridge
 extension_drivers = port_security

 [ml2_type_flat]
-flat_networks = provider,company
+flat_networks = provider

+[ml2_type_vxlan]
+vni_ranges = 1:1000

Controllerノードを含む全ノードのlinuxbridge設定変更 192.168.1.59は VXLAN のカプセル化を行うインターフェースを指定する。

$ diff -u /etc/neutron/plugins/ml2/linuxbridge_agent.ini.orig /etc/neutron/plugins/ml2/linuxbridge_agent.ini
--- /etc/neutron/plugins/ml2/linuxbridge_agent.ini.orig 2019-03-06 12:29:49.326323707 -0800
+++ /etc/neutron/plugins/ml2/linuxbridge_agent.ini      2019-03-06 12:31:43.249367381 -0800
@@ -2,7 +2,8 @@
 physical_interface_mappings = provider:eno1

 [vxlan]
-enable_vxlan = false
+enable_vxlan = true
+local_ip = 192.168.1.59

 [securitygroup]
 firewall_driver = neutron.agent.linux.iptables_firewall.IptablesFirewallDriver

Floating ipがつけられない問題

$ openstack floating ip set --port ac400c96-c53e-4ef2-ba3b-c5ba1381c34e 192.168.1.110
NotFoundException: Unknown erro

Tenant network が中から外へのトラフィックを送るための経路が無かったため。 Router で Provider ネットワークにつなぐことで解決

$ openstack router create lb-mgmt-router
$ openstack router add subnet lb-mgmt-router lb-mgmt-subnet
$ openstack router set lb-mgmt-router --external-gateway provider

VXLANで構成されたテナントネットワーク上でDHCPが取れない問題

VXLAN設定の問題だった。 下記のように修正することで通るようになった。

--- /etc/neutron/plugins/ml2/ml2_conf.ini.orig  2019-03-06 10:54:51.062392771 -0800
+++ /etc/neutron/plugins/ml2/ml2_conf.ini       2019-03-08 10:31:28.388334795 -0800
@@ -1,9 +1,11 @@
 [ml2]
-type_drivers = flat,vlan
-tenant_network_types =
-mechanism_drivers = linuxbridge
+type_drivers = flat,vxlan
+tenant_network_types = vxlan
+mechanism_drivers = linuxbridge,l2population
 extension_drivers = port_security

 [ml2_type_flat]
-flat_networks = provider,company
+flat_networks = provider

+[ml2_type_vxlan]
+vni_ranges = 1:1000
--- /etc/neutron/plugins/ml2/linuxbridge_agent.ini.orig 2019-03-06 16:54:07.934162103 -0800
+++ /etc/neutron/plugins/ml2/linuxbridge_agent.ini      2019-03-08 10:35:50.800145841 -0800
@@ -2,7 +2,13 @@
 physical_interface_mappings = provider:enp2s0,company:enp0s31f6

 [vxlan]
-enable_vxlan = false
+enable_vxlan = true
+local_ip = 192.168.1.1
+l2_population = true
+vxlan_group =
+
+[agent]
+prevent_arp_spoofing = true

 [securitygroup]
 firewall_driver = neutron.agent.linux.iptables_firewall.IptablesFirewallDriver
oomichi commented 5 years ago

/etc/neutron/plugins/ml2/ml2_conf.ini

[ml2]
type_drivers = flat,vlan
tenant_network_types =
mechanism_drivers = linuxbridge
extension_drivers = port_security

[ml2_type_flat]
flat_networks = provider,company

flat_networks として provider と company のみ許可していたためと思われる。 デフォルト値は * なので、それに変更して動くか試してみる。 また、vlan は個別項目を設定しておらず使っていないので削除してみる。

oomichi commented 5 years ago
$ diff -u ml2_conf.ini.orig  ml2_conf.ini
--- ml2_conf.ini.orig   2019-03-06 10:54:51.062392771 -0800
+++ ml2_conf.ini        2019-03-06 10:55:06.574561000 -0800
@@ -1,9 +1,9 @@
 [ml2]
-type_drivers = flat,vlan
+type_drivers = flat
 tenant_network_types =
 mechanism_drivers = linuxbridge
 extension_drivers = port_security

 [ml2_type_flat]
-flat_networks = provider,company
+flat_networks = *

駄目だ、引き続き問題が起きている。

$ openstack network create lb-mgmt-net
Error while executing command: HttpException: Unknown error, {"NeutronError": {"message": "Unable to create the network. No tenant network is available for allocation.", "type": "NoNetworkAvailable", "detail": ""}}
oomichi commented 5 years ago

Neutronのエラーログ

2019-03-06 11:04:20.890 2478 INFO neutron.wsgi [-] 127.0.0.1 "GET / HTTP/1.1" status: 200  len: 251 time: 0.0014389
2019-03-06 11:04:21.553 2478 INFO neutron.quota [req-74f39fe3-291f-4365-97ca-1f7310636376 e5e99065fd524f328c2f81e28a6fbc42 682e74f275fe427abd9eb6759f3b68c5 - default default] Loaded quota_driver: <neutron.db.quota.driver.DbQuotaDriver object at 0x7f91a0a78e50>.
2019-03-06 11:04:21.832 2478 ERROR neutron.pecan_wsgi.hooks.translation [req-74f39fe3-291f-4365-97ca-1f7310636376 e5e99065fd524f328c2f81e28a6fbc42 682e74f275fe427abd9eb6759f3b68c5 - default default] POST failed.: NoNetworkAvailable: Unable to create the network. No tenant network is available for allocation.
2019-03-06 11:04:21.832 2478 ERROR neutron.pecan_wsgi.hooks.translation Traceback (most recent call last):
2019-03-06 11:04:21.832 2478 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python2.7/dist-packages/pecan/core.py", line 683, in __call__
2019-03-06 11:04:21.832 2478 ERROR neutron.pecan_wsgi.hooks.translation     self.invoke_controller(controller, args, kwargs, state)
2019-03-06 11:04:21.832 2478 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python2.7/dist-packages/pecan/core.py", line 574, in invoke_controller
2019-03-06 11:04:21.832 2478 ERROR neutron.pecan_wsgi.hooks.translation     result = controller(*args, **kwargs)
2019-03-06 11:04:21.832 2478 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python2.7/dist-packages/neutron/db/api.py", line 91, in wrapped
2019-03-06 11:04:21.832 2478 ERROR neutron.pecan_wsgi.hooks.translation     setattr(e, '_RETRY_EXCEEDED', True)
2019-03-06 11:04:21.832 2478 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
2019-03-06 11:04:21.832 2478 ERROR neutron.pecan_wsgi.hooks.translation     self.force_reraise()
2019-03-06 11:04:21.832 2478 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
2019-03-06 11:04:21.832 2478 ERROR neutron.pecan_wsgi.hooks.translation     six.reraise(self.type_, self.value, self.tb)
2019-03-06 11:04:21.832 2478 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python2.7/dist-packages/neutron/db/api.py", line 87, in wrapped
2019-03-06 11:04:21.832 2478 ERROR neutron.pecan_wsgi.hooks.translation     return f(*args, **kwargs)
2019-03-06 11:04:21.832 2478 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python2.7/dist-packages/oslo_db/api.py", line 147, in wrapper
2019-03-06 11:04:21.832 2478 ERROR neutron.pecan_wsgi.hooks.translation     ectxt.value = e.inner_exc
2019-03-06 11:04:21.832 2478 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
2019-03-06 11:04:21.832 2478 ERROR neutron.pecan_wsgi.hooks.translation     self.force_reraise()
2019-03-06 11:04:21.832 2478 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
2019-03-06 11:04:21.832 2478 ERROR neutron.pecan_wsgi.hooks.translation     six.reraise(self.type_, self.value, self.tb)
2019-03-06 11:04:21.832 2478 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python2.7/dist-packages/oslo_db/api.py", line 135, in wrapper
2019-03-06 11:04:21.832 2478 ERROR neutron.pecan_wsgi.hooks.translation     return f(*args, **kwargs)
2019-03-06 11:04:21.832 2478 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python2.7/dist-packages/neutron/db/api.py", line 126, in wrapped
2019-03-06 11:04:21.832 2478 ERROR neutron.pecan_wsgi.hooks.translation     LOG.debug("Retry wrapper got retriable exception: %s", e)
2019-03-06 11:04:21.832 2478 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
2019-03-06 11:04:21.832 2478 ERROR neutron.pecan_wsgi.hooks.translation     self.force_reraise()
2019-03-06 11:04:21.832 2478 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
2019-03-06 11:04:21.832 2478 ERROR neutron.pecan_wsgi.hooks.translation     six.reraise(self.type_, self.value, self.tb)
2019-03-06 11:04:21.832 2478 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python2.7/dist-packages/neutron/db/api.py", line 122, in wrapped
2019-03-06 11:04:21.832 2478 ERROR neutron.pecan_wsgi.hooks.translation     return f(*dup_args, **dup_kwargs)
2019-03-06 11:04:21.832 2478 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python2.7/dist-packages/neutron/pecan_wsgi/controllers/utils.py", line 76, in wrapped
2019-03-06 11:04:21.832 2478 ERROR neutron.pecan_wsgi.hooks.translation     return f(*args, **kwargs)
2019-03-06 11:04:21.832 2478 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python2.7/dist-packages/neutron/pecan_wsgi/controllers/resource.py", line 159, in post
2019-03-06 11:04:21.832 2478 ERROR neutron.pecan_wsgi.hooks.translation     return self.create(resources)
2019-03-06 11:04:21.832 2478 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python2.7/dist-packages/neutron/pecan_wsgi/controllers/resource.py", line 177, in create
2019-03-06 11:04:21.832 2478 ERROR neutron.pecan_wsgi.hooks.translation     return {key: creator(*creator_args, **creator_kwargs)}
2019-03-06 11:04:21.832 2478 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python2.7/dist-packages/neutron/common/utils.py", line 627, in inner
2019-03-06 11:04:21.832 2478 ERROR neutron.pecan_wsgi.hooks.translation     return f(self, context, *args, **kwargs)
2019-03-06 11:04:21.832 2478 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python2.7/dist-packages/neutron/db/api.py", line 161, in wrapped
2019-03-06 11:04:21.832 2478 ERROR neutron.pecan_wsgi.hooks.translation     return method(*args, **kwargs)
2019-03-06 11:04:21.832 2478 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python2.7/dist-packages/neutron/db/api.py", line 91, in wrapped
2019-03-06 11:04:21.832 2478 ERROR neutron.pecan_wsgi.hooks.translation     setattr(e, '_RETRY_EXCEEDED', True)
2019-03-06 11:04:21.832 2478 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
2019-03-06 11:04:21.832 2478 ERROR neutron.pecan_wsgi.hooks.translation     self.force_reraise()
2019-03-06 11:04:21.832 2478 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
2019-03-06 11:04:21.832 2478 ERROR neutron.pecan_wsgi.hooks.translation     six.reraise(self.type_, self.value, self.tb)
2019-03-06 11:04:21.832 2478 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python2.7/dist-packages/neutron/db/api.py", line 87, in wrapped
2019-03-06 11:04:21.832 2478 ERROR neutron.pecan_wsgi.hooks.translation     return f(*args, **kwargs)
2019-03-06 11:04:21.832 2478 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python2.7/dist-packages/oslo_db/api.py", line 147, in wrapper
2019-03-06 11:04:21.832 2478 ERROR neutron.pecan_wsgi.hooks.translation     ectxt.value = e.inner_exc
2019-03-06 11:04:21.832 2478 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
2019-03-06 11:04:21.832 2478 ERROR neutron.pecan_wsgi.hooks.translation     self.force_reraise()
2019-03-06 11:04:21.832 2478 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
2019-03-06 11:04:21.832 2478 ERROR neutron.pecan_wsgi.hooks.translation     six.reraise(self.type_, self.value, self.tb)
2019-03-06 11:04:21.832 2478 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python2.7/dist-packages/oslo_db/api.py", line 135, in wrapper
2019-03-06 11:04:21.832 2478 ERROR neutron.pecan_wsgi.hooks.translation     return f(*args, **kwargs)
2019-03-06 11:04:21.832 2478 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python2.7/dist-packages/neutron/db/api.py", line 126, in wrapped
2019-03-06 11:04:21.832 2478 ERROR neutron.pecan_wsgi.hooks.translation     LOG.debug("Retry wrapper got retriable exception: %s", e)
2019-03-06 11:04:21.832 2478 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
2019-03-06 11:04:21.832 2478 ERROR neutron.pecan_wsgi.hooks.translation     self.force_reraise()
2019-03-06 11:04:21.832 2478 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
2019-03-06 11:04:21.832 2478 ERROR neutron.pecan_wsgi.hooks.translation     six.reraise(self.type_, self.value, self.tb)
2019-03-06 11:04:21.832 2478 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python2.7/dist-packages/neutron/db/api.py", line 122, in wrapped
2019-03-06 11:04:21.832 2478 ERROR neutron.pecan_wsgi.hooks.translation     return f(*dup_args, **dup_kwargs)
2019-03-06 11:04:21.832 2478 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/plugin.py", line 837, in create_network
2019-03-06 11:04:21.832 2478 ERROR neutron.pecan_wsgi.hooks.translation     result, mech_context = self._create_network_db(context, network)
2019-03-06 11:04:21.832 2478 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/plugin.py", line 796, in _create_network_db
2019-03-06 11:04:21.832 2478 ERROR neutron.pecan_wsgi.hooks.translation     tenant_id)
2019-03-06 11:04:21.832 2478 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/managers.py", line 209, in create_network_segments
2019-03-06 11:04:21.832 2478 ERROR neutron.pecan_wsgi.hooks.translation     segment = self._allocate_tenant_net_segment(context)
2019-03-06 11:04:21.832 2478 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/managers.py", line 272, in _allocate_tenant_net_segment
2019-03-06 11:04:21.832 2478 ERROR neutron.pecan_wsgi.hooks.translation     raise exc.NoNetworkAvailable()
2019-03-06 11:04:21.832 2478 ERROR neutron.pecan_wsgi.hooks.translation NoNetworkAvailable: Unable to create the network. No tenant network is available for allocation.
2019-03-06 11:04:21.832 2478 ERROR neutron.pecan_wsgi.hooks.translation
2019-03-06 11:04:22.132 2478 INFO neutron.wsgi [req-74f39fe3-291f-4365-97ca-1f7310636376 e5e99065fd524f328c2f81e28a6fbc42 682e74f275fe427abd9eb6759f3b68c5 - default default] 127.0.0.1 "POST /v2.0/networks HTTP/1.1" status: 503  len: 369 time: 1.2399452
oomichi commented 5 years ago

https://docs.openstack.org/newton/install-guide-ubuntu/launch-instance-networks-provider.html によると /etc/neutron/plugins/ml2/linuxbridge_agent.ini で lb-mgmt-net に対応する ethernet を指定していないからか? network に対応する ethernet が必要だが、flat network の場合 VLAN などで分けずにノード間の通信を行うため、たぶん1つの ethernet に対して1つの network しか作れないと思う。そうしないと DHCP 通信が分離できないはずだし。IaaS として VLAN を選ぶのが正しかったのかもしれない。 まずは、上記のエラーが発生している原因を特定する。

エラー箇所 /usr/lib/python2.7/dist-packages/neutron/plugins/ml2/managers.py

    def _allocate_tenant_net_segment(self, context):
        for network_type in self.tenant_network_types:
            segment = self._allocate_segment(context, network_type)
            if segment:
                return segment
        raise exc.NoNetworkAvailable()

Flat network 用ドライバのコメントを見ると、tenant network をサポートしていないと明記 neutron/plugins/ml2/drivers/type_flat.py#n103

    def allocate_tenant_segment(self, context):
        # Tenant flat networks are not supported.
        return

一方、VLAN用ドライバはちゃんと tenant network が実装されている。 http://git.openstack.org/cgit/openstack/neutron/tree/neutron/plugins/ml2/drivers/type_vlan.py#n203

    def allocate_tenant_segment(self, context):
        for physnet in self.network_vlan_ranges:
            alloc = self.allocate_partially_specified_segment(
                context, physical_network=physnet)
            if alloc:
                break
        else:
            return
        return {api.NETWORK_TYPE: p_const.TYPE_VLAN,
                api.PHYSICAL_NETWORK: alloc.physical_network,
                api.SEGMENTATION_ID: alloc.vlan_id,
                api.MTU: self.get_mtu(alloc.physical_network)}

Flat network はあくまで外部(Internet)接続用?

oomichi commented 5 years ago

ゲートで使われている Devstack の設定情報

[securitygroup]
firewall_driver = openvswitch

[ml2]
tenant_network_types = vxlan
extension_drivers = port_security,qos
mechanism_drivers = openvswitch,linuxbridge

[ml2_type_gre]
tunnel_id_ranges = 1:1000

[ml2_type_vxlan]
vni_ranges = 1:1000

[ml2_type_flat]
flat_networks = public,

[ml2_type_vlan]
network_vlan_ranges = public

[ml2_type_geneve]
vni_ranges = 1:1000

[agent]
extensions = qos
tunnel_types = vxlan
root_helper_daemon = sudo /usr/local/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf
root_helper = sudo /usr/local/bin/neutron-rootwrap /etc/neutron/rootwrap.conf

[ovs]
datapath_type = system
bridge_mappings = public:br-ex
tunnel_bridge = br-tun
local_ip = 198.72.124.7
oomichi commented 5 years ago

そもそも tenant_network_types を指定していなかったのが問題。 逆に type_drivers を指定していないけど大丈夫なのか? → デフォルトが type_drivers = local,flat,vlan,gre,vxlan,geneve で全指定になっていたので問題なし。 ゲートでは VXLAN がテナントネットワークとして使われているので、それに合わせる。

以下のように変更してみる。

$ diff -u ml2_conf.ini.orig ml2_conf.ini
--- ml2_conf.ini.orig   2019-03-06 10:54:51.062392771 -0800
+++ ml2_conf.ini        2019-03-06 12:08:06.510577001 -0800
@@ -1,9 +1,11 @@
 [ml2]
-type_drivers = flat,vlan
-tenant_network_types =
+type_drivers = flat,vxlan
+tenant_network_types = vxlan
 mechanism_drivers = linuxbridge
 extension_drivers = port_security

 [ml2_type_flat]
-flat_networks = provider,company
+flat_networks = provider

+[ml2_type_vxlan]
+vni_ranges = 1:1000
oomichi commented 5 years ago

成功した

$ openstack network create lb-mgmt-net
+---------------------------+--------------------------------------+
| Field                     | Value                                |
+---------------------------+--------------------------------------+
| admin_state_up            | UP                                   |
| availability_zone_hints   |                                      |
| availability_zones        |                                      |
| created_at                | 2019-03-06T20:27:31Z                 |
| description               |                                      |
| dns_domain                | None                                 |
| id                        | e2971ef3-e5ac-4642-b8a0-9c9007069716 |
| ipv4_address_scope        | None                                 |
| ipv6_address_scope        | None                                 |
| is_default                | False                                |
| is_vlan_transparent       | None                                 |
| mtu                       | 1450                                 |
| name                      | lb-mgmt-net                          |
| port_security_enabled     | True                                 |
| project_id                | 682e74f275fe427abd9eb6759f3b68c5     |
| provider:network_type     | vxlan                                |
| provider:physical_network | None                                 |
| provider:segmentation_id  | 83                                   |
| qos_policy_id             | None                                 |
| revision_number           | 2                                    |
| router:external           | Internal                             |
| segments                  | None                                 |
| shared                    | False                                |
| status                    | ACTIVE                               |
| subnets                   |                                      |
| tags                      |                                      |
| updated_at                | 2019-03-06T20:27:31Z                 |
+---------------------------+--------------------------------------+
oomichi commented 5 years ago

compute ノード側の設定変更も必要。 下記 192.168.1.59 は compute ノードのIPアドレス。 このアドレス上でオーバーレイネットワークを構築する。

$ diff -u /etc/neutron/plugins/ml2/linuxbridge_agent.ini.orig /etc/neutron/plugins/ml2/linuxbridge_agent.ini
--- /etc/neutron/plugins/ml2/linuxbridge_agent.ini.orig 2019-03-06 12:29:49.326323707 -0800
+++ /etc/neutron/plugins/ml2/linuxbridge_agent.ini      2019-03-06 12:31:43.249367381 -0800
@@ -2,7 +2,8 @@
 physical_interface_mappings = provider:eno1

 [vxlan]
-enable_vxlan = false
+enable_vxlan = true
+local_ip = 192.168.1.59

 [securitygroup]
 firewall_driver = neutron.agent.linux.iptables_firewall.IptablesFirewallDriver
oomichi commented 5 years ago
  1. provider network only exists Pass
  2. Test nova boot -> Expect succeeding with provider network Pass
  3. Create network and subnetwork with VXLAN
    Pass
    $ openstack network create lb-mgmt-net
    +---------------------------+--------------------------------------+
    | Field                     | Value                                |
    +---------------------------+--------------------------------------+
    ...
    | provider:network_type     | vxlan                                |
    +---------------------------+--------------------------------------+
    $ openstack subnet create --network lb-mgmt-net --allocation-pool start=192.168.10.100,end=192.168.10.200 --dns-nameserver 8.8.4.4 --gateway 192.168.10.1 --subnet-range 192.168.10.0/24 lb-mgmt-subnet
  4. nova boot without specifying network -> Expect failing
    Pass
    $ nova boot --key-name mykey --flavor m1.medium --image 73f70800-1d0c-4569-a3c5-29c70775c334 test
    ERROR (Conflict): Multiple possible networks found, use a Network ID to be more specific. (HTTP 409) (Request-ID: req-2401b84f-3e12-417c-8acb-9d4e8c1861e1)
  5. nova boot with specifying network -> Expect succeeding
    $ nova boot --key-name mykey --flavor m1.medium --image 73f70800-1d0c-4569-a3c5-29c70775c334 --nic net-name=lb-mgmt-net test
    $ nova list
    +--------------------------------------+------+--------+------------+-------------+----------------------------+
    | ID                                   | Name | Status | Task State | Power State | Networks                   |
    +--------------------------------------+------+--------+------------+-------------+----------------------------+
    | 9103f6e7-73d2-4e95-a89f-342d57fc1307 | test | ACTIVE | -          | Running     | lb-mgmt-net=192.168.10.106 |
    +--------------------------------------+------+--------+------------+-------------+----------------------------+

    起動したが NIC が有効になっていない模様

    
    $ nova console-log test
    ...
    [[0;32m  OK  [0m] Reached target Network (Pre).
         Starting Raise network interfaces...
    [[0m[0;31m*     [0m] A start job is running for Raise network interfaces (9s / 5min 3s)[K[[0;1;31m*[0m[0;31m*    [0m] A start job is running for Raise network interfaces (9s / 5min 3s)[K[[0;31m*[0;1;31m*[0m[0;31m*   [0m] A start job is running for Raise network interfaces (10s / 5min 3s)[K[ [0;31m*[0;1;31m*[0m[0;31m*  [0m] A start job is running for Raise network interfaces (10s / 5min 3s)[
    ...
    3s / 5min 3s)[K[[0;1;31mFAILED[0m] Failed to start Raise network interfaces.
    See 'systemctl status networking.service' for details.
         Starting Initial cloud-init job (metadata service crawler)...
    [[0;32m  OK  [0m] Reached target Network.
    [  309.176412] cloud-init[840]: Cloud-init v. 18.2 running 'init' at Wed, 06 Mar 2019 21:00:01 +0000. Up 309.02 seconds.
    [  309.178438] cloud-init[840]: ci-info: ++++++++++++++++++++++++++++++++++++Net device info+++++++++++++++++++++++++++++++++++++
    [  309.180268] cloud-init[840]: ci-info: +--------+------+------------------------------+-----------+-------+-------------------+
    [  309.182063] cloud-init[840]: ci-info: | Device |  Up  |           Address            |    Mask   | Scope |     Hw-Address    |
    [  309.183878] cloud-init[840]: ci-info: +--------+------+------------------------------+-----------+-------+-------------------+
    [  309.185706] cloud-init[840]: ci-info: |  ens3  | True |              .               |     .     |   .   | fa:16:3e:60:96:30 |
    [  309.187473] cloud-init[840]: ci-info: |  ens3  | True | fe80::f816:3eff:fe60:9630/64 |     .     |  link | fa:16:3e:60:96:30 |
    [  309.189276] cloud-init[840]: ci-info: |   lo   | True |          127.0.0.1           | 255.0.0.0 |   .   |         .         |
    [  309.191042] cloud-init[840]: ci-info: |   lo   | True |           ::1/128            |     .     |  host |         .         |
    [  309.192862] cloud-init[840]: ci-info: +--------+------+------------------------------+-----------+-------+-------------------+
    [  309.194626] cloud-init[840]: 2019-03-06 21:00:01,823 - url_helper.py[WARNING]: Calling 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [0/120s]: request error [HTTPConnectionPool(host='169.254.169.254', port=80): Max retries exceeded with url: /2009-04-04/meta-data/instance-id (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7fd6df673470>: Failed to establish a new connection: [Errno 101] Network is unreachable',))]
    ...
    Ubuntu 16.04.4 LTS ubuntu ttyS0

ubuntu login:


cloud-init による外への通信も失敗している。
oomichi commented 5 years ago

2つのNicを持ったVMを起動し、lb-mgmt-net 上の通信ができるかチェックする。

$ nova boot --key-name mykey --flavor m1.medium --image 73f70800-1d0c-4569-a3c5-29c70775c334 --nic net-name=lb-mgmt-net --nic net-name=provider test2
$ nova list
+--------------------------------------+-------+--------+------------+-------------+----------------------------------------------------+
| ID                                   | Name  | Status | Task State | Power State | Networks                                           |
+--------------------------------------+-------+--------+------------+-------------+----------------------------------------------------+
| 9103f6e7-73d2-4e95-a89f-342d57fc1307 | test  | ACTIVE | -          | Running     | lb-mgmt-net=192.168.10.106                         |
| f967637f-bbd5-45a1-95eb-fd0b9b9d1e1c | test2 | ACTIVE | -          | Running     | lb-mgmt-net=192.168.10.108; provider=192.168.1.107 |
+--------------------------------------+-------+--------+------------+-------------+----------------------------------------------------+

console-log をみると外への通信で失敗している。 lb-mgmt-net にデフォルトゲートウェイを設定しているのが駄目かも。 はずしてみる。

$ openstack subnet delete 1b88db03-e254-4415-ac53-0a548a1f16f0
$ openstack subnet create --network lb-mgmt-net --allocation-pool start=192.168.10.100,end=192.168.10.200 --subnet-range 192.168.10.0/24 lb-mgmt-subnet
$ nova boot --key-name mykey --flavor m1.medium --image 73f70800-1d0c-4569-a3c5-29c70775c334 --nic net-name=lb-mgmt-net test1
$ nova boot --key-name mykey --flavor m1.medium --image 73f70800-1d0c-4569-a3c5-29c70775c334 --nic net-name=lb-mgmt-net --nic net-name=provider test2
$ nova boot --key-name mykey --flavor m1.medium --image 73f70800-1d0c-4569-a3c5-29c70775c334 --nic net-name=provider test3
$ nova list
+--------------------------------------+-------+--------+------------+-------------+----------------------------------------------------+
| ID                                   | Name  | Status | Task State | Power State | Networks                                           |
+--------------------------------------+-------+--------+------------+-------------+----------------------------------------------------+
| 84ed1896-dfc7-4134-95eb-a3effeeeb22e | test1 | ACTIVE | -          | Running     | lb-mgmt-net=192.168.10.105                         |
| b4f9d49e-908f-4b68-a5da-eb2403e77f29 | test2 | ACTIVE | -          | Running     | lb-mgmt-net=192.168.10.100; provider=192.168.1.110 |
| 81e649c1-caa6-487a-a1b1-a1e6b65073cd | test3 | ACTIVE | -          | Running     | provider=192.168.1.106                             |
+--------------------------------------+-------+--------+------------+-------------+----------------------------------------------------+

上記 provider ネットワークだけを持つ test3 には Ping が通るが、test2 には通らない。 console-log を見ると NIC のアップで失敗している。

[[0;32m  OK  [0m] Reached target Network (Pre).
         Starting Raise network interfaces...
[[0m[0;31m*     [0m] A start job is running for Raise network interfaces (8s / 5min 3s)[K[[0;1;31m*[0m[0;31m*    [0m] A start job is running for Raise network interfaces (9s / 5min 3s)[K[[0;31m*[0;1;31m*[0m[0;31m*   [0m] A start job is running for Raise network interfaces (10s / 5min 3s)[K[ [0;31m*[0;1;31m*[0m[0;31m*  [0m] A start job is running for Raise network interfaces (10s / 5min 3s)[K[  [0;31m*[0;1;31m*[0m[0;31m* [0m] A start job is running for Raise network interfaces (11s / 5min 3s)[K[   [0;31m*[0;1;31m*[0m[0;31m*[0m] A start job is running for Raise network interfaces (11s / 5min 3s)[K[    [0;31m*[0;1;31m*[0m] A start job is running for Raise network interfaces (12s / 5min 3s)[K[     [0;31m*[0m] A start job is running for Raise network interfaces (13s / 5min 3s)[K[    [0;31m*[0;1;31m*[0m] A start job is running for Raise network interfaces (13s / 5min 3s)[K[   [0;31m*[0;1;31m*[0m[0;31m*[0m] A start job is running for Raise network interfaces (14s / 5min 3s)[K[  [0;31m*[0;1;31m*[0m[0;31m* [0m] A start job is running for Raise network interfaces (14s / 5min 3s)[K[ [0;31m*[0;1;31m*[0m[0;31m*  [0m] A start job is running for Raise network interfaces (15s / 5min 3s)[K[[0;31m*[0;1;31m*[0m[0;31m*   [0m] A start job is running for Ra

成功する場合(test3)のログ

[[0;32m  OK  [0m] Reached target Network (Pre).
         Starting Raise network interfaces...
[[0;32m  OK  [0m] Started Raise network interfaces.
         Starting Initial cloud-init job (metadata service crawler)...
[[0;32m  OK  [0m] Reached target Network.
oomichi commented 5 years ago

controller の linuxbridge_agent.ini に VXLAN設定するの忘れていた。

$ diff -u linuxbridge_agent.ini.orig linuxbridge_agent.ini
--- linuxbridge_agent.ini.orig  2019-03-06 16:54:07.934162103 -0800
+++ linuxbridge_agent.ini       2019-03-06 16:54:55.670651200 -0800
@@ -2,7 +2,8 @@
 physical_interface_mappings = provider:enp2s0,company:enp0s31f6

 [vxlan]
-enable_vxlan = false
+enable_vxlan = true
+local_ip = 192.168.1.1

 [securitygroup]
 firewall_driver = neutron.agent.linux.iptables_firewall.IptablesFirewallDriver
oomichi commented 5 years ago

vm, subnet, network を削除して、再度試す。

$ openstack network create lb-mgmt-net
$ openstack subnet create --network lb-mgmt-net --allocation-pool start=192.168.10.100,end=192.168.10.200 --subnet-range 192.168.10.0/24 lb-mgmt-subnet
$ nova boot --key-name mykey --flavor m1.medium --image 73f70800-1d0c-4569-a3c5-29c70775c334 --nic net-name=lb-mgmt-net test1

事象は変わらず

oomichi commented 5 years ago

トラブルシューティング

$ openstack network agent list
+--------------------------------------+----------------------+------------+-------------------+-------+-------+---------------------------+
| ID                                   | Agent Type           | Host       | Availability Zone | Alive | State | Binary                    |
+--------------------------------------+----------------------+------------+-------------------+-------+-------+---------------------------+
| 1a7faecf-bd6b-44a7-b456-9d56506dcbf8 | Metadata agent       | iaas-ctrl  | None              | :-)   | UP    | neutron-metadata-agent    |
| 2cb40e67-c41c-4172-b742-699dc85451fb | Linux bridge agent   | iaas-cpu02 | None              | XXX   | UP    | neutron-linuxbridge-agent |
| 2ff0a087-636f-413d-9394-d015a5a4f032 | Linux bridge agent   | iaas-cpu03 | None              | XXX   | UP    | neutron-linuxbridge-agent |
| 3c658599-86f3-4fc1-bc2e-0f06cc14d29e | DHCP agent           | iaas-ctrl  | nova              | :-)   | UP    | neutron-dhcp-agent        |
| 3c66d18c-5670-42ab-9fa7-4c4582469b0b | Linux bridge agent   | iaas-ctrl  | None              | :-)   | UP    | neutron-linuxbridge-agent |
| 4c3e58ff-5d9a-4a63-bf1d-30694882b11c | Loadbalancerv2 agent | iaas-ctrl  | None              | :-)   | UP    | neutron-lbaasv2-agent     |
| 73af79f5-9358-4564-9d48-a54e790c83dc | Linux bridge agent   | iaas-cpu01 | None              | :-)   | UP    | neutron-linuxbridge-agent |
+--------------------------------------+----------------------+------------+-------------------+-------+-------+---------------------------+

cpu02, 03 の linuxbridge-agent が死んでいる・・・ cpu02 のログ

2019-03-06 12:36:21.457 1002 INFO neutron.common.config [-] /usr/bin/neutron-linuxbridge-agent version 12.0.2
2019-03-06 12:36:21.457 1002 INFO neutron.plugins.ml2.drivers.linuxbridge.agent.linuxbridge_neutron_agent [-] Interface mappings: {'provider': 'eno1'}
2019-03-06 12:36:21.458 1002 INFO neutron.plugins.ml2.drivers.linuxbridge.agent.linuxbridge_neutron_agent [-] Bridge mappings: {}
2019-03-06 12:36:21.507 1002 ERROR neutron.plugins.ml2.drivers.linuxbridge.agent.linuxbridge_neutron_agent [-] Tunneling cannot be enabled without the local_ip bound to an interface on the host. Please configure local_ip 192.168.1.61 on the host interface to be used for tunneling and restart the agent.

cpu02, 03でエージェント再起動したところ

# systemctl restart neutron-linuxbridge-agent.service

直った。

$ openstack network agent list
+--------------------------------------+----------------------+------------+-------------------+-------+-------+---------------------------+
| ID                                   | Agent Type           | Host       | Availability Zone | Alive | State | Binary                    |
+--------------------------------------+----------------------+------------+-------------------+-------+-------+---------------------------+
| 1a7faecf-bd6b-44a7-b456-9d56506dcbf8 | Metadata agent       | iaas-ctrl  | None              | :-)   | UP    | neutron-metadata-agent    |
| 2cb40e67-c41c-4172-b742-699dc85451fb | Linux bridge agent   | iaas-cpu02 | None              | :-)   | UP    | neutron-linuxbridge-agent |
| 2ff0a087-636f-413d-9394-d015a5a4f032 | Linux bridge agent   | iaas-cpu03 | None              | :-)   | UP    | neutron-linuxbridge-agent |
| 3c658599-86f3-4fc1-bc2e-0f06cc14d29e | DHCP agent           | iaas-ctrl  | nova              | :-)   | UP    | neutron-dhcp-agent        |
| 3c66d18c-5670-42ab-9fa7-4c4582469b0b | Linux bridge agent   | iaas-ctrl  | None              | :-)   | UP    | neutron-linuxbridge-agent |
| 4c3e58ff-5d9a-4a63-bf1d-30694882b11c | Loadbalancerv2 agent | iaas-ctrl  | None              | :-)   | UP    | neutron-lbaasv2-agent     |
| 73af79f5-9358-4564-9d48-a54e790c83dc | Linux bridge agent   | iaas-cpu01 | None              | :-)   | UP    | neutron-linuxbridge-agent |
+--------------------------------------+----------------------+------------+-------------------+-------+-------+---------------------------+

結果、console-log 上は NIC がちゃんと立上がった模様。 → 一部のノードの neutron-linuxbridge-agent が立上がっていないだけで、本事象が発生するのは良くわからない。 しかし、2NICのVMは引き続き NIC が立上がらない。

oomichi commented 5 years ago

lb-mgmt-net ネットワーク上の2VMの片方に floating-ip を割当てSSHログインし、lb-mgmt-net 上のPingが通ることを確認する。 2つ目のVMでもNICが立上がらない問題発生。両方とも lb-mgmt-net ネットワークだけに接続しているのに、最初のVMはNIC成功なのに後のが失敗するのは?

2回目も同様の事象発生。 1 VM: test1 は iaas-cpu02 で起動。成功 2 VM: test2 は iaas-cpu03 で起動。失敗 3 VM: test3 は iaas-cpu01 で起動。失敗 4 VM: test4 は iaas-cpu02 で起動。成功 5 VM: test5 は iaas-cpu03 で起動。失敗 6 VM: test6 は iaas-cpu01 で起動。失敗 7 VM: test7 は iaas-cpu02 で起動。成功

つまり iaas-cpu02 で立上がったVMのみNICが取れている。 ひとまず test1 と test4の間で通信が出来ることを確認する。

$ openstack floating ip create provider
$ openstack floating ip list
+--------------------------------------+---------------------+------------------+------+--------------------------------------+----------------------------------+
| ID                                   | Floating IP Address | Fixed IP Address | Port | Floating Network                     | Project                          |
+--------------------------------------+---------------------+------------------+------+--------------------------------------+----------------------------------+
| f2bf1831-efa8-4b26-a14b-59457b8ff180 | 192.168.1.110       | None             | None | bfd9fd43-c9b4-43ad-bb67-930c674f2605 | 682e74f275fe427abd9eb6759f3b68c5 |
+--------------------------------------+---------------------+------------------+------+--------------------------------------+----------------------------------+
$ openstack floating --debug ip set --port ac400c96-c53e-4ef2-ba3b-c5ba1381c34e 192.168.1.110
...
REQ: curl -g -i -X PUT http://iaas-ctrl:9696/v2.0/floatingips/f2bf1831-efa8-4b26-a14b-59457b8ff180 -H "User-Agent: osc-lib/1.9.0 keystoneauth1/3.4.0 python-requests/2.18.4 CPython/2.7.12" -H "Content-Type: application/json" -H "X-Auth-Token: {SHA1}9f05f838fbe06d8a0b150aa231b8c8eaa4d289a1" -d '{"floatingip": {"port_id": "ac400c96-c53e-4ef2-ba3b-c5ba1381c34e"}}'
http://iaas-ctrl:9696 "PUT /v2.0/floatingips/f2bf1831-efa8-4b26-a14b-59457b8ff180 HTTP/1.1" 404 306
RESP: [404] Content-Type: application/json Content-Length: 306 X-Openstack-Request-Id: req-37bb0232-ff1c-4180-b7d6-92c522936cc1 Date: Thu, 07 Mar 2019 01:46:03 GMT Connection: keep-alive
RESP BODY: {"NeutronError": {"message": "External network bfd9fd43-c9b4-43ad-bb67-930c674f2605 is not reachable from subnet 8e6e3ead-b4b6-44d5-bd72-31662fb16183.  Therefore, cannot associate Port ac400c96-c53e-4ef2-ba3b-c5ba1381c34e with a Floating IP.", "type": "ExternalGatewayForFloatingIPNotFound", "detail": ""}}

失敗。External network: provider が subnet lb-mgmt-subnet から到達可能でないため。 そもそも floating-ip ってローカルネットワーク上のVMを外に見せるためのものじゃなかったっけ? → たぶん、中から外への通信はできていないとならないはず。Routerでつなぐ必要がありそう。

oomichi commented 5 years ago

Help message がイマイチな件は https://storyboard.openstack.org/#!/story/2005163 として登録した。

oomichi commented 5 years ago

下記でネットワーク lb-mgmt-router と外部ネットワーク provider の間を Router でつなぎ

$ openstack router create lb-mgmt-router
$ openstack router add subnet lb-mgmt-router lb-mgmt-subnet
$ openstack router set lb-mgmt-router --external-gateway provider

再度 Floating IP をつけてみたところ成功

$ nova list
+--------------------------------------+-------+--------+------------+-------------+----------------------------+
| ID                                   | Name  | Status | Task State | Power State | Networks                   |
+--------------------------------------+-------+--------+------------+-------------+----------------------------+
| 95f05cd5-2c55-4957-bc9e-18f63074c0f7 | test1 | ACTIVE | -          | Running     | lb-mgmt-net=192.168.10.105 |
+--------------------------------------+-------+--------+------------+-------------+----------------------------+
$ openstack floating ip set --port a8357da0-7f64-4c25-ae52-124a2dbbfb03 192.168.1.110
$ nova list
+--------------------------------------+-------+--------+------------+-------------+-------------------------------------------+
| ID                                   | Name  | Status | Task State | Power State | Networks                                  |
+--------------------------------------+-------+--------+------------+-------------+-------------------------------------------+
| 95f05cd5-2c55-4957-bc9e-18f63074c0f7 | test1 | ACTIVE | -          | Running     | lb-mgmt-net=192.168.10.105, 192.168.1.110 |
+--------------------------------------+-------+--------+------------+-------------+-------------------------------------------+

しかし、引き続き NIC が立上がらない問題あり。 floating ipに対して Ping も通らない。cpu02を含む全てのノードで立ち上げても動かない。

oomichi commented 5 years ago

NICが立上がらない問題を地道に調べる必要あり。

$ nova list
+--------------------------------------+-------+--------+------------+-------------+-------------------------------------------+
| ID                                   | Name  | Status | Task State | Power State | Networks                                  |
+--------------------------------------+-------+--------+------------+-------------+-------------------------------------------+
| 95f05cd5-2c55-4957-bc9e-18f63074c0f7 | test1 | ACTIVE | -          | Running     | lb-mgmt-net=192.168.10.105, 192.168.1.110 |
+--------------------------------------+-------+--------+------------+-------------+-------------------------------------------+
$ openstack port list | grep 192.168.10.105
| a8357da0-7f64-4c25-ae52-124a2dbbfb03 |      | fa:16:3e:f6:41:e2 | ip_address='192.168.10.105', subnet_id='8e6e3ead-b4b6-44d5-bd72-31662fb16183' | ACTIVE |
$ nova show test1
+--------------------------------------+------------------------------------------------------------+
| Property                             | Value                                                      |
+--------------------------------------+------------------------------------------------------------+
| OS-DCF:diskConfig                    | MANUAL                                                     |
| OS-EXT-AZ:availability_zone          | nova                                                       |
| OS-EXT-SRV-ATTR:host                 | iaas-cpu02                                                 |
...

Port ID は a8357da0-7f64-4c25-ae52-124a2dbbfb03 であることを確認 問題のVMはiaas-cpu02 に存在、iaas-cpu02 にログイン。 ブリッジの状態を確認 -> tapa8357da0-7f がVMのデバイス("tap<Port IDの一部>")

$ brctl show
bridge name     bridge id               STP enabled     interfaces
brqebe0b402-ae          8000.0601094dc904       no              tapa8357da0-7f
                                                        vxlan-13
virbr0          8000.52540006e678       yes             virbr0-nic
oomichi commented 5 years ago

気づいたら provider ネットワークに作った VM も外へ通信できなくなってしまった。 ひとまず、問題のセルフマネージメントネットワークを削除する。

$ openstack network delete lb-mgmt-net
Failed to delete network with name or ID 'lb-mgmt-net': Unable to delete Network for openstack.network.v2.network.Network(provider:physical_network=None, ipv6_address_scope=None, revision_number=3, port_security_enabled=True, provider:network_type=vxlan, id=ebe0b402-aeaa-44dd-9eff-993a08b57bee, router:external=False, availability_zone_hints=[], availability_zones=[u'nova'], ipv4_address_scope=None, shared=False, project_id=682e74f275fe427abd9eb6759f3b68c5, status=ACTIVE, subnets=[u'8e6e3ead-b4b6-44d5-bd72-31662fb16183'], description=, tags=[], updated_at=2019-03-07T01:21:34Z, provider:segmentation_id=13, name=lb-mgmt-net, admin_state_up=True, created_at=2019-03-07T01:21:25Z, mtu=1450)
1 of 1 networks failed to delete.

失敗 やっぱり上記のエラーメッセージは役に立たない。 Neutron 自体は以下のように役立つ情報を送っている。

RESP BODY: {"NeutronError": {"message": "Unable to complete operation on network ebe0b402-aeaa-44dd-9eff-993a08b57bee. There are one or more ports still in use on the network.", "type": "NetworkInUse", "detail": ""}}

neutron コマンドはちゃんと上記メッセージを出している。

$ neutron net-delete lb-mgmt-net
neutron CLI is deprecated and will be removed in the future. Use openstack CLI instead.
Unable to complete operation on network ebe0b402-aeaa-44dd-9eff-993a08b57bee. There are one or more ports still in use on the network.
Neutron server returns request_ids: ['req-12ca03a6-ba3b-400c-8bb4-ecfb14a570fe']

下記で解消。

$ openstack router remove port lb-mgmt-router ec10e55e-0229-43a0-8f14-c14f54dd5829
$ neutron router-delete lb-mgmt-router
$ openstack network delete lb-mgmt-net

しかし、VM内から外部への通信はできていない。

$ ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
^C
--- 8.8.8.8 ping statistics ---
4 packets transmitted, 0 received, 100% packet loss, time 3023ms
oomichi commented 5 years ago

routing 情報 -> default gwとして 192.168.1.1 が設定されている -> そこへは ping がとおる

$ route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         192.168.1.1     0.0.0.0         UG    0      0        0 ens3
169.254.169.254 192.168.1.100   255.255.255.255 UGH   0      0        0 ens3
192.168.1.0     0.0.0.0         255.255.255.0   U     0      0        0 ens3
$ ping 192.168.1.1
PING 192.168.1.1 (192.168.1.1) 56(84) bytes of data.
64 bytes from 192.168.1.1: icmp_seq=1 ttl=64 time=0.505 ms
64 bytes from 192.168.1.1: icmp_seq=2 ttl=64 time=0.962 ms

IPマスカレード?以前も直した気がする。 /etc/ufw/before.rules での指定NICを外部IPアドレスが振られている enp0s31f6 に変更する。 → 正解、直った。そもそも brq5bff0834-bd がなくなっている。Neutron での操作で変化してしまう?

#-A POSTROUTING -s 192.168.1.0/24 -o brq5bff0834-bd -j MASQUERADE
-A POSTROUTING -s 192.168.1.0/24 -o enp0s31f6 -j MASQUERADE
oomichi commented 5 years ago

再トライ → 再現

$ openstack network create lb-mgmt-net
$ openstack subnet create --network lb-mgmt-net --allocation-pool start=192.168.10.100,end=192.168.10.200 --subnet-range 192.168.10.0/24 lb-mgmt-subnet
$ nova boot --key-name mykey --flavor m1.medium --image 73f70800-1d0c-4569-a3c5-29c70775c334 --nic net-name=lb-mgmt-net test
$ nova list
+--------------------------------------+------+--------+------------+-------------+----------------------------+
| ID                                   | Name | Status | Task State | Power State | Networks                   |
+--------------------------------------+------+--------+------------+-------------+----------------------------+
| 2e193bb9-2ca0-4ae7-9e05-1400fa007558 | test | ACTIVE | -          | Running     | lb-mgmt-net=192.168.10.109 |
+--------------------------------------+------+--------+------------+-------------+----------------------------+
$ openstack port list
+--------------------------------------+------+-------------------+-------------------------------------------------------------------------------+--------+
| ID                                   | Name | MAC Address       | Fixed IP Addresses                                                            | Status |
+--------------------------------------+------+-------------------+-------------------------------------------------------------------------------+--------+
| 1428aa20-fde9-4e31-9fa5-b16313c74e92 |      | fa:16:3e:b8:91:03 | ip_address='192.168.1.109', subnet_id='43ed897b-3c10-4d5c-8f6d-263edcd817c7'  | ACTIVE |
| 93fdfb70-63d2-4804-a28e-9f0f70890b8c |      | fa:16:3e:52:57:f1 | ip_address='192.168.10.109', subnet_id='dcb0ca7e-edea-4ba1-bb07-e50e51fde57e' | ACTIVE |
...

iaas-cpu03 にログイン

$ brctl show
bridge name     bridge id               STP enabled     interfaces
brq6a303139-3b          8000.e6f569ccdcb3       no              tap93fdfb70-63
                                                        vxlan-10
brqbfd9fd43-c9          8000.f44d306e9cc0       no              eno1
virbr0          8000.525400253304       yes             virbr0-nic

tap93fdfb70-63 が VM のNICデバイス brq6a303139-3b が tap93fdfb70-63 と vxlan-10 をブリッジしていることがわかる。 さらに vxlan-10 の状態を確認、物理 NIC のeno1 上にできていることがわかる。

$ ip -d link show vxlan-10
18: vxlan-10: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master brq6a303139-3b state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether e6:f5:69:cc:dc:b3 brd ff:ff:ff:ff:ff:ff promiscuity 1
    vxlan id 10 group 224.0.0.1 dev eno1 srcport 0 0 dstport 8472 ageing 300
    bridge_slave state forwarding priority 32 cost 100 hairpin off guard off root_block off fastleave off learning on flood on addrgenmode eui64
oomichi commented 5 years ago

DHCPの情報が渡らなかったのはController側のため? controllerのネットワーク状態を確認する。 同一 VXLANインターフェースが存在、物理NIC enp2s0 上にあることを確認

$ ip -d link show vxlan-10
8: vxlan-10: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master brq6a303139-3b state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether 26:4c:a1:c1:a9:75 brd ff:ff:ff:ff:ff:ff promiscuity 1
    vxlan id 10 group 224.0.0.1 dev enp2s0 srcport 0 0 dstport 8472 ageing 300
    bridge_slave state forwarding priority 32 cost 100 hairpin off guard off root_block off fastleave off learning on flood on addrgenmode eui64
$ brctl show
bridge name     bridge id               STP enabled     interfaces
brq6a303139-3b          8000.264ca1c1a975       no              tapd7459afe-90
                                                        vxlan-10
brqbfd9fd43-c9          8000.001b2139e5fa       no              enp2s0
                                                        tapf233ccef-a5
docker0         8000.024210971ae8       no

tapd7459afe-90 と vxlan-10 がブリッジbrq6a303139-3b で繋がっていることがわかる tapd7459afe-90 は 192.168.10.100 のポートであることがわかる。 192.168.10.100 は subnet 作成時に指定したIPアドレス範囲の最初のもの

$ openstack port list
+--------------------------------------+------+-------------------+-------------------------------------------------------------------------------+--------+
| ID                                   | Name | MAC Address       | Fixed IP Addresses                                                            | Status |
+--------------------------------------+------+-------------------+-------------------------------------------------------------------------------+--------+
| d7459afe-9004-478f-884b-8e0a8e9991bc |      | fa:16:3e:4b:f2:e5 | ip_address='192.168.10.100', subnet_id='dcb0ca7e-edea-4ba1-bb07-e50e51fde57e' | ACTIVE |

この先は dnsmasq の net nsに繋がっている模様。 192.168.10.0/24 を対応する dnsmasq プロセスを確認

$ sudo ps -ef | grep dnsmasq
 nobody    4304     1  0 17:38 ?        00:00:00 dnsmasq --no-hosts --no-resolv --strict-order --except-interface=lo --pid-file=/var/lib/neutron/dhcp/6a303139-3bc7-4621-a27b-415f409cb743/pid --dhcp-hostsfile=/var/lib/neutron/dhcp/6a303139-3bc7-4621-a27b-415f409cb743/host --addn-hosts=/var/lib/neutron/dhcp/6a303139-3bc7-4621-a27b-415f409cb743/addn_hosts --dhcp-optsfile=/var/lib/neutron/dhcp/6a303139-3bc7-4621-a27b-415f409cb743/opts --dhcp-leasefile=/var/lib/neutron/dhcp/6a303139-3bc7-4621-a27b-415f409cb743/leases --dhcp-match=set:ipxe,175 --bind-interfaces
 --interface=ns-d7459afe-90
 --dhcp-range=set:tag0,192.168.10.0,static,255.255.255.0,86400s
 --dhcp-option-force=option:mtu,1450 --dhcp-lease-max=256 --conf-file= --domain=openstacklocal

ns-d7459afe-90 でListenしていることがわかる。

oomichi commented 5 years ago

ns-d7459afe-90 インターフェースを探す。 tapd7459afe-90 は netns 上にある ns-d7459afe-90 と繋がっているはず netns を確認。

$ ip netns
qdhcp-6a303139-3bc7-4621-a27b-415f409cb743 (id: 1)
qdhcp-bfd9fd43-c9b4-43ad-bb67-930c674f2605 (id: 0)

id:1 (新しいほう) でインターフェース一覧を確認

$ sudo ip netns exec qdhcp-6a303139-3bc7-4621-a27b-415f409cb743 ifconfig
...
ns-d7459afe-90 Link encap:Ethernet  HWaddr fa:16:3e:4b:f2:e5
          inet addr:169.254.169.254  Bcast:169.254.255.255  Mask:255.255.0.0
          inet6 addr: fe80::f816:3eff:fe4b:f2e5/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1450  Metric:1
          RX packets:6 errors:0 dropped:0 overruns:0 frame:0
          TX packets:5 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:468 (468.0 B)  TX bytes:438 (438.0 B)
$
$ sudo ip netns exec qdhcp-6a303139-3bc7-4621-a27b-415f409cb743 ip link show
...
2: ns-d7459afe-90@if7: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether fa:16:3e:4b:f2:e5 brd ff:ff:ff:ff:ff:ff link-netnsid 0
$
$ sudo ip netns exec qdhcp-6a303139-3bc7-4621-a27b-415f409cb743 ethtool -S ns-d7459afe-90
NIC statistics:
     peer_ifindex: 7

ns-d7459afe-90 の index が2、Pair先のindexが7であることがわかる net ns 外での index 7を確認する。 -> tapd7459afe-90 と Pairになっていることがわかる。

$ ip link show
...
7: tapd7459afe-90@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master brq6a303139-3b state UP mode DEFAULT group default qlen 1000
    link/ether 4a:90:fb:ec:74:08 brd ff:ff:ff:ff:ff:ff link-netnsid 1
oomichi commented 5 years ago

ネットワークの全体像が見えたので、tcpdump で各NICの状態を確認していく。

dnsmasqプロセス -> ns-d7459afe-90 in netns -> tapd7459afe-90 (dhcpパケットが届いていないことを確認) -> brq6a303139-3b -> vxlan-10 (dhcpパケットが届いていないことを確認) -> enp2s0 --- これより上が Controller ノード内、下がCPU ノード --- -> eno1 -> vxlan-10 (dhcpパケットが届いていることを確認) -> brq6a303139-3b -> tap93fdfb70-63 -> VM

つまり Controller ノードの vxlan-10 インターフェースに dhcp パケットが届いていないことになる。

oomichi commented 5 years ago

Controller ノードの vxlan 設定を見直す。 たぶん、スイッチがマルチキャストを通さない? linuxbridge の初期値が 224.0.0.1 のようだ。 http://git.openstack.org/cgit/openstack/neutron/tree/neutron/conf/plugins/ml2/drivers/linuxbridge.py#n38 下記のように linuxbridge の設定で無効にしてみる。

# diff -u linuxbridge_agent.ini.orig  linuxbridge_agent.ini
--- linuxbridge_agent.ini.orig  2019-03-06 16:54:07.934162103 -0800
+++ linuxbridge_agent.ini       2019-03-07 19:47:47.658474992 -0800
@@ -2,7 +2,9 @@
 physical_interface_mappings = provider:enp2s0,company:enp0s31f6

 [vxlan]
-enable_vxlan = false
+enable_vxlan = true
+local_ip = 192.168.1.1
+vxlan_group = none

 [securitygroup]
 firewall_driver = neutron.agent.linux.iptables_firewall.IptablesFirewallDriver

上記 none だと駄目っぽい

2019-03-07 19:52:06.879 893 ERROR neutron.plugins.ml2.drivers.linuxbridge.agent.linuxbridge_neutron_agent [-] Invalid VXLAN Group: none, must be an address or network (in CIDR notation) in a multicast range of the same address family as local_ip: 192.168.1.1: AddrFormatError: invalid IPNetwork none
oomichi commented 5 years ago
# diff -u linuxbridge_agent.ini.orig  linuxbridge_agent.ini
--- linuxbridge_agent.ini.orig  2019-03-06 16:54:07.934162103 -0800
+++ linuxbridge_agent.ini       2019-03-07 19:47:47.658474992 -0800
@@ -2,7 +2,9 @@
 physical_interface_mappings = provider:enp2s0,company:enp0s31f6

 [vxlan]
-enable_vxlan = false
+enable_vxlan = true
+local_ip = 192.168.1.1
+vxlan_group =

 [securitygroup]
 firewall_driver = neutron.agent.linux.iptables_firewall.IptablesFirewallDriver

上記のように設定変更すると、下記の 735 行目でエラーになる。 vxlan_ucast_supported を指定するようにしないと駄目っぽい。

 727     def check_vxlan_support(self):
 728         self.vxlan_mode = lconst.VXLAN_NONE
 729
 730         if self.vxlan_ucast_supported():
 731             self.vxlan_mode = lconst.VXLAN_UCAST
 732         elif self.vxlan_mcast_supported():
 733             self.vxlan_mode = lconst.VXLAN_MCAST
 734         else:
 735             raise exceptions.VxlanNetworkUnsupported()
 736         LOG.debug('Using %s VXLAN mode', self.vxlan_mode)

チェックロジック

 677     def vxlan_ucast_supported(self):
 678         if not cfg.CONF.VXLAN.l2_population:
 679             return False
 680         if not ip_lib.iproute_arg_supported(
 681                 ['bridge', 'fdb'], 'append'):
 682             LOG.warning('Option "%(option)s" must be supported by command '
 683                         '"%(command)s" to enable %(mode)s mode',
 684                         {'option': 'append',
 685                          'command': 'bridge fdb',
 686                          'mode': 'VXLAN UCAST'})
 687             return False
 688
 689         test_iface = None
 690         for seg_id in moves.range(1, constants.MAX_VXLAN_VNI + 1):
 691             if (ip_lib.device_exists(self.get_vxlan_device_name(seg_id))
 692                     or ip_lib.vxlan_in_use(seg_id)):
 693                 continue
 694             test_iface = self.ensure_vxlan(seg_id)
 695             break
 696         else:
 697             LOG.error('No valid Segmentation ID to perform UCAST test.')
 698             return False
 699
 700         try:
 701             bridge_lib.FdbInterface.append(constants.FLOODING_ENTRY[0],
 702                                            test_iface, '1.1.1.1',
 703                                            log_fail_as_error=False)
 704             return True
 705         except RuntimeError:
 706             return False
 707         finally:
 708             self.delete_interface(test_iface)

cfg.CONF.VXLAN.l2_population を True にする。

# diff -u /etc/neutron/plugins/ml2/linuxbridge_agent.ini.orig /etc/neutron/plugins/ml2/linuxbridge_agent.ini
--- /etc/neutron/plugins/ml2/linuxbridge_agent.ini.orig 2019-03-06 16:54:07.934162103 -0800
+++ /etc/neutron/plugins/ml2/linuxbridge_agent.ini      2019-03-07 20:12:24.786563612 -0800
@@ -2,7 +2,10 @@
 physical_interface_mappings = provider:enp2s0,company:enp0s31f6

 [vxlan]
-enable_vxlan = false
+enable_vxlan = true
+local_ip = 192.168.1.1
+l2_population = true
+vxlan_group =

 [securitygroup]
 firewall_driver = neutron.agent.linux.iptables_firewall.IptablesFirewallDriver

multicast が止まったことを確認した。

$ ip -d link show vxlan-36
9: vxlan-36: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master brq337443f2-b7 state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether ba:bc:6e:0a:5b:09 brd ff:ff:ff:ff:ff:ff promiscuity 1
    vxlan id 36 dev enp2s0 srcport 0 0 dstport 8472 ageing 300
    bridge_slave state forwarding priority 32 cost 100 hairpin off guard off root_block off fastleave off learning on flood on addrgenmode eui64
oomichi commented 5 years ago

TODO: 全ノードに上記設定を展開する。

oomichi commented 5 years ago

neutron-vxlan

oomichi commented 5 years ago

Neutron-vxlan.pptx

oomichi commented 5 years ago

まだ駄目っぽい。 Controller側の VXLAN インターフェースに DHCP パケットが届いていない。

oomichi commented 5 years ago

https://docs.openstack.org/liberty/ja/install-guide-ubuntu/neutron-controller-install-option2.html によると

mechanism_drivers = linuxbridge,l2population

とl2populationを指定しなければなら無そう。 その他、もろもろを再設定。

--- /etc/neutron/plugins/ml2/ml2_conf.ini.orig  2019-03-06 10:54:51.062392771 -0800
+++ /etc/neutron/plugins/ml2/ml2_conf.ini       2019-03-08 10:31:28.388334795 -0800
@@ -1,9 +1,11 @@
 [ml2]
-type_drivers = flat,vlan
-tenant_network_types =
-mechanism_drivers = linuxbridge
+type_drivers = flat,vxlan
+tenant_network_types = vxlan
+mechanism_drivers = linuxbridge,l2population
 extension_drivers = port_security

 [ml2_type_flat]
-flat_networks = provider,company
+flat_networks = provider

+[ml2_type_vxlan]
+vni_ranges = 1:1000
--- /etc/neutron/plugins/ml2/linuxbridge_agent.ini.orig 2019-03-06 16:54:07.934162103 -0800
+++ /etc/neutron/plugins/ml2/linuxbridge_agent.ini      2019-03-08 10:35:50.800145841 -0800
@@ -2,7 +2,13 @@
 physical_interface_mappings = provider:enp2s0,company:enp0s31f6

 [vxlan]
-enable_vxlan = false
+enable_vxlan = true
+local_ip = 192.168.1.1
+l2_population = true
+vxlan_group =
+
+[agent]
+prevent_arp_spoofing = true

 [securitygroup]
 firewall_driver = neutron.agent.linux.iptables_firewall.IptablesFirewallDriver
oomichi commented 5 years ago

できるようになった。