Closed gujun4990 closed 10 months ago
Hi, @gujun4990 . The server shouldn't send the same update twice. I don't know of a case where that may happen.
I found an issue though in the python-idl clinet code that might cause client to request the same change again in a rare case, but it will require the client to re-connect twice between sending these two transactions. By re-connect here I mean a full re-connection to a new leader with the successful monitor request/reply sequence, i.e. we should have two full leadership transfers and two re-connections to these leaders in a relatively short amount of time.
Do you see something like that in your setup when the issue is happening?
@igsilya , Thanks for your reply, I have two questions:
As we use ovsdbapp's version is 1.15.0, existed below code(ovsdbapp/backend/ovs_idl/transaction.py:do_commit
):
if status == txn.TRY_AGAIN:
LOG.debug("OVSDB transaction returned TRY_AGAIN, retrying")
# In the case that there is a reconnection after
# Connection.run() calls self.idl.run() but before do_commit()
# is called, commit_block() can loop w/o calling idl.run()
# which does the reconnect logic. It will then always return
# TRY_AGAIN until we time out and Connection.run() calls
# idl.run() again. So, call idl.run() here just in case.
self.api.idl.run()
# In the event that there is an issue with the txn or the db
# is down, don't spam new txns as fast as we can
time.sleep(min(2 ** retries, self.time_remaining(), MAX_SLEEP))
retries += 1
continue
I review the ovsdbapp's patches and found a new patch to drop these codes. So I wonder these codes have a greater chance to cause the problem?
monitor_cond_since
that will receive the difference between a previous and a new data. Before 2.17, content of the database would have been overwritten by a reply.FWIW, I posted a patch for the issue I found that may or may not be related to your issue. In case you want to try it: https://patchwork.ozlabs.org/project/openvswitch/patch/20230909021913.929606-1-i.maximets@ovn.org/
Thanks a lot, I have had a try. Currently, the issue has not been reproduced again. But not sure this is successful, because the issue is occasional. We wish community can merge as soon as possible, so that we can keep watching to see if the issue will reproduce again. Thanks again.
@igsilya, I observed that the patch you submit have been merged. Thank you for your patch. I will close the issue.
Yeah, I applied the change yesterday. I'm still not sure if that's the issue you're facing, because it requires double re-connection in order to cause issues. Feel free to reopen in case you'll reproduce it again.
Environment
ovsdb-server have 3 replicas to construct a cluster.
Description
We run neutron-ovn-metadata-agent service and it use ovs python client to communicate with ovsdb-server, client use
leader_only=True
to connect ovsdb-server. When ovsdb-server's leader to change, client will reconnect to new leader. After connect successfully, we found operation that client send transact method to server is failure (occasional problem). Server return\"where\" clause test failed
. We analyzed and found that ovsdb-server and ovsdb client data are inconsistency.transact method's normal data
transact method's abnormal data
Compare above normal and abnormal data for transact method, abnormal data are lack of
["neutron:ovn-metadata-sb-cfg","660331"]
. But ovsdb-server is existed the data, so ovsdb-server return\"where\" clause test failed
. Whether there's the situation that ovsdb client receive two same update3 notification when leader change? So after ovsdb client to run diff function, delete["neutron:ovn-metadata-sb-cfg","660331"]
data from client's memory.