Open sp3c1k opened 2 days ago
I'm sorry for another lenghty issue, I've tried to get together as much information as I could. It seems there is quite a lot of changes in the new openstack releases regarding neutron and some of them are currently not compatible with using calico as neutron core_plugin.
When using calico in neutron-server (caracal release) as a core plugin:
neutron.conf
there seems to be a problem when creating security group rules.
neutron-server.log
Although the API returns HTTP 500, the rule is created.
Expected Behavior
The security group rule should be created without an error.
Current Behavior
When creating a security group rule, the HTTP 500 is returned, but the rule is created anyway.
This behavior seems to be caused by multiple factors (changes in oslo_db, neutron, neutron_lib) in regards how it creates sessions, how the context with the session is propagated throughout the application and changes regarding the preparation for the sqlalchemy 2.0.
Openstack devstack on caracal release do not have this kind of issue with native networking. When calico is used as a core plugin, we encounter this issue.
Possible Solution
The problem seems to be coming from: https://github.com/projectcalico/calico/blob/4ad72b7c787a714403febb2ad72c5947b94d3647/networking-calico/networking_calico/plugins/ml2/drivers/calico/mech_calico.py#L886-L896
conn_url = str(session.connection().engine.url).lower()
creates a new connection with the session just to get the engine url so it can be used in this part of the code: https://github.com/projectcalico/calico/blob/4ad72b7c787a714403febb2ad72c5947b94d3647/networking-calico/networking_calico/plugins/ml2/drivers/calico/mech_calico.py#L895-L909However that seems to be a problem later in neutron where the
is_session_active
method moved from neutron to neutron_lib:With the Yoga release, where we do not see this problem, it exits this method via
meaning the
autocommit = True
andis_active = False
, at least that is what I am seeing in the debugger.With the Caracal release, situation is different,
is_session_active
returns Trueautocommit = False
(seems to be related to changes in oslo_db)is_active = True
(it looks like it does not matter as long as autocommit is not True)session.get_transaction()
returns True, because there are transactions inside a sessionsession.get_transaction()._connections
also returns True as there are connections inside of the transaction (from the calico opening a connection as mentioned earlier - it probably did not matter to this point, because it always exited theis_session_active
check via theif session.autocommit:
as written above).what helped me to resolve this issue
I fixed this behavior with this in
networking_calico/plugins/ml2/drivers/calico/mech_calico.py:_txn_from_context()
:Connection url can be obtained (as observed on my side in debugger) from
session.bind
.That means if the session already has an established bind/connection to database, we can use that instead of creating a new connection just to get the
conn_url
.The fix also keeps the fallback to the old behavior in case the
session.bind = None
, in that case, it fallbacks and creates a newsession.connection()
to get the url.Steps to Reproduce (for bugs)
Context
This issue was found after upgrading an Openstack cluster from Yoga release to Caracal.
The security group rule is created successfully, but Neutron API returns HTTP 500. So far only encountered this issue when creating new security group rules and always when neutron tries to remove the quota reservation as seen in the log at the top of this issue.
Your Environment