projectcalico / calico

Cloud native networking and network security
https://docs.tigera.io/calico/latest/about/
Apache License 2.0
5.89k stars 1.31k forks source link

openstack neutron-server (caracal release) using calico as core_plugin returns error when creating security group rules #9238

Open sp3c1k opened 2 days ago

sp3c1k commented 2 days ago

When using calico in neutron-server (caracal release) as a core plugin:

neutron.conf

[DEFAULT]
core_plugin = calico

there seems to be a problem when creating security group rules.

neutron-server.log

2024-09-17 10:58:05.713 1599056 INFO networking_calico.plugins.ml2.drivers.calico.mech_calico [None req-b0e2eb50-8743-43b9-a3d9-1f92938c0d1c 7d78172f8f9246d580bdc89bef2fc60b e4bd7bfd012e4bf99da4b0413e5198fe - - 4d372d87fc404b40a63e2b4d175b76db 4d372d87fc404b40a63e2b4d175b76db] security_groups_rule_updated: <neutron_lib.context.Context object at 0x7c8d10a5b430> ['3afec1e5-116e-4966-8271-02de2ceca667']
2024-09-17 10:58:05.713 1599056 INFO networking_calico.plugins.ml2.drivers.calico.mech_calico [None req-b0e2eb50-8743-43b9-a3d9-1f92938c0d1c 7d78172f8f9246d580bdc89bef2fc60b e4bd7bfd012e4bf99da4b0413e5198fe - - 4d372d87fc404b40a63e2b4d175b76db 4d372d87fc404b40a63e2b4d175b76db] Calico state already initialised for PID 1599056
2024-09-17 10:58:05.714 1599056 INFO networking_calico.plugins.ml2.drivers.calico.mech_calico [None req-b0e2eb50-8743-43b9-a3d9-1f92938c0d1c 7d78172f8f9246d580bdc89bef2fc60b e4bd7bfd012e4bf99da4b0413e5198fe - - 4d372d87fc404b40a63e2b4d175b76db 4d372d87fc404b40a63e2b4d175b76db] Updating security group IDs ['3afec1e5-116e-4966-8271-02de2ceca667']
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource [None req-b0e2eb50-8743-43b9-a3d9-1f92938c0d1c 7d78172f8f9246d580bdc89bef2fc60b e4bd7bfd012e4bf99da4b0413e5198fe - - 4d372d87fc404b40a63e2b4d175b76db 4d372d87fc404b40a63e2b4d175b76db] create failed: No details.: RuntimeError: Method <function remove_reservation at 0x7c8d125953f0> cannot be called within a transaction.
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource Traceback (most recent call last):
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource   File "/usr/lib/python3/dist-packages/neutron/api/v2/resource.py", line 98, in resource
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource     result = method(request=request, **args)
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource   File "/usr/lib/python3/dist-packages/neutron/api/v2/base.py", line 440, in create
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource     return self._create(request, body, **kwargs)
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource   File "/usr/lib/python3/dist-packages/neutron_lib/db/api.py", line 137, in wrapped
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource     with excutils.save_and_reraise_exception():
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource   File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 227, in __exit__
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource     self.force_reraise()
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource   File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 200, in force_reraise
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource     raise self.value
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource   File "/usr/lib/python3/dist-packages/neutron_lib/db/api.py", line 135, in wrapped
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource     return f(*args, **kwargs)
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource   File "/usr/lib/python3/dist-packages/oslo_db/api.py", line 144, in wrapper
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource     with excutils.save_and_reraise_exception() as ectxt:
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource   File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 227, in __exit__
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource     self.force_reraise()
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource   File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 200, in force_reraise
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource     raise self.value
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource   File "/usr/lib/python3/dist-packages/oslo_db/api.py", line 142, in wrapper
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource     return f(*args, **kwargs)
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource   File "/usr/lib/python3/dist-packages/neutron_lib/db/api.py", line 183, in wrapped
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource     with excutils.save_and_reraise_exception():
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource   File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 227, in __exit__
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource     self.force_reraise()
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource   File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 200, in force_reraise
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource     raise self.value
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource   File "/usr/lib/python3/dist-packages/neutron_lib/db/api.py", line 181, in wrapped
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource     return f(*dup_args, **dup_kwargs)
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource   File "/usr/lib/python3/dist-packages/neutron/api/v2/base.py", line 567, in _create
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource     return notify({self._resource: self._view(request.context,
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource   File "/usr/lib/python3/dist-packages/neutron/api/v2/base.py", line 507, in notify
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource     quota.QUOTAS.commit_reservation(
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource   File "/usr/lib/python3/dist-packages/neutron/quota/__init__.py", line 103, in commit_reservation
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource     self.get_driver().commit_reservation(context, reservation_id)
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource   File "/usr/lib/python3/dist-packages/neutron/db/quota/driver.py", line 271, in commit_reservation
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource     quota_api.remove_reservation(context, reservation_id,
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource   File "/usr/lib/python3/dist-packages/neutron/common/utils.py", line 724, in inner
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource     raise RuntimeError(_("Method %s cannot be called within a "
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource RuntimeError: Method <function remove_reservation at 0x7c8d125953f0> cannot be called within a transaction.
2024-09-17 10:58:05.786 1599056 ERROR neutron.api.v2.resource
2024-09-17 10:58:05.788 1599056 INFO neutron.wsgi [None req-b0e2eb50-8743-43b9-a3d9-1f92938c0d1c 7d78172f8f9246d580bdc89bef2fc60b e4bd7bfd012e4bf99da4b0413e5198fe - - 4d372d87fc404b40a63e2b4d175b76db 4d372d87fc404b40a63e2b4d175b76db] 10.230.8.7,127.0.0.1 "POST /v2.0/security-group-rules HTTP/1.1" status: 500  len: 344 time: 0.2229519
+--------------------------------------+-------------+-----------+-----------+------------+-----------+-----------------------+----------------------+
| ID                                   | IP Protocol | Ethertype | IP Range  | Port Range | Direction | Remote Security Group | Remote Address Group |
+--------------------------------------+-------------+-----------+-----------+------------+-----------+-----------------------+----------------------+
| 8fb5aa7a-a1e4-4f46-927e-8e5729437350 | None        | IPv4      | 0.0.0.0/0 |            | egress    | None                  | None                 |
| b09b1bdc-ee3d-4086-8b70-8df02696d760 | None        | IPv6      | ::/0      |            | egress    | None                  | None                 |
+--------------------------------------+-------------+-----------+-----------+------------+-----------+-----------------------+----------------------+
root@c1:/var/log/neutron# openstack security group rule create --ingress --remote-ip 0.0.0.0/0 --protocol tcp --dst-port 55 specik-test
Error while executing command: HttpException: 500, Request Failed: internal server error while processing your request.
root@c1:/var/log/neutron# openstack security group rule list specik-test
+--------------------------------------+-------------+-----------+-----------+------------+-----------+-----------------------+----------------------+
| ID                                   | IP Protocol | Ethertype | IP Range  | Port Range | Direction | Remote Security Group | Remote Address Group |
+--------------------------------------+-------------+-----------+-----------+------------+-----------+-----------------------+----------------------+
| 8fb5aa7a-a1e4-4f46-927e-8e5729437350 | None        | IPv4      | 0.0.0.0/0 |            | egress    | None                  | None                 |
| a5afa681-72fa-4df3-a502-d719699d7a83 | tcp         | IPv4      | 0.0.0.0/0 | 55:55      | ingress   | None                  | None                 |
| b09b1bdc-ee3d-4086-8b70-8df02696d760 | None        | IPv6      | ::/0      |            | egress    | None                  | None                 |
+--------------------------------------+-------------+-----------+-----------+------------+-----------+-----------------------+----------------------+

Although the API returns HTTP 500, the rule is created.

Expected Behavior

The security group rule should be created without an error.

Current Behavior

When creating a security group rule, the HTTP 500 is returned, but the rule is created anyway.

This behavior seems to be caused by multiple factors (changes in oslo_db, neutron, neutron_lib) in regards how it creates sessions, how the context with the session is propagated throughout the application and changes regarding the preparation for the sqlalchemy 2.0.

Openstack devstack on caracal release do not have this kind of issue with native networking. When calico is used as a core plugin, we encounter this issue.

Possible Solution

The problem seems to be coming from: https://github.com/projectcalico/calico/blob/4ad72b7c787a714403febb2ad72c5947b94d3647/networking-calico/networking_calico/plugins/ml2/drivers/calico/mech_calico.py#L886-L896

conn_url = str(session.connection().engine.url).lower() creates a new connection with the session just to get the engine url so it can be used in this part of the code: https://github.com/projectcalico/calico/blob/4ad72b7c787a714403febb2ad72c5947b94d3647/networking-calico/networking_calico/plugins/ml2/drivers/calico/mech_calico.py#L895-L909

However that seems to be a problem later in neutron where the is_session_active method moved from neutron to neutron_lib:

With the Yoga release, where we do not see this problem, it exits this method via

    if session.autocommit:  # old behaviour, to be removed with sqlalchemy 2.0
        return session.is_active

meaning the autocommit = True and is_active = False, at least that is what I am seeing in the debugger.

With the Caracal release, situation is different, is_session_active returns True

    if getattr(session, 'autocommit', None):
        # old behaviour, to be removed with sqlalchemy 2.0
        return session.is_active
    if not session.get_transaction():
        return False
    if not session.get_transaction()._connections:
        return False
    return True

what helped me to resolve this issue

I fixed this behavior with this in networking_calico/plugins/ml2/drivers/calico/mech_calico.py:_txn_from_context():

        if getattr(session, 'bind', None):
            conn_url = str(session.bind.url).lower()
        else:
            conn_url = str(session.connection().engine.url).lower()

Connection url can be obtained (as observed on my side in debugger) from session.bind.

        # sqlalchemy/orm/session.py
        # :param bind: An optional :class:`_engine.Engine` or
        #    :class:`_engine.Connection` to
        #    which this ``Session`` should be bound. When specified, all SQL
        #    operations performed by this session will execute via this
        #    connectable.

That means if the session already has an established bind/connection to database, we can use that instead of creating a new connection just to get the conn_url.

The fix also keeps the fallback to the old behavior in case the session.bind = None, in that case, it fallbacks and creates a new session.connection() to get the url.

Steps to Reproduce (for bugs)

  1. Setup Openstack Caracal release
  2. Use calico as core_plugin for neutron-server
  3. create new security group
  4. create a new security group rule to the created security group

Context

This issue was found after upgrading an Openstack cluster from Yoga release to Caracal.

The security group rule is created successfully, but Neutron API returns HTTP 500. So far only encountered this issue when creating new security group rules and always when neutron tries to remove the quota reservation as seen in the log at the top of this issue.

Your Environment

sp3c1k commented 1 day ago

I'm sorry for another lenghty issue, I've tried to get together as much information as I could. It seems there is quite a lot of changes in the new openstack releases regarding neutron and some of them are currently not compatible with using calico as neutron core_plugin.