sonic-net / sonic-sairedis

SAI object interface to Redis database, as used in the SONiC project
Other
56 stars 263 forks source link

Check MDIO server thread joinable before join the thread #1342

Closed jiahua-wang closed 6 months ago

jiahua-wang commented 7 months ago

This PR is to address the issue syncd crash observed during sonic-mgmt reboot tests #17346

kcudnik commented 7 months ago

Is the root case already known? Was it double join?

jiahua-wang commented 7 months ago

Double join could be an explanation. More likely scenario is that m_taskThread is not joinable because after MdioIpcServer constructor startMdioThread() was not called or because startMdioThread() failed. @kenneth-arista can provide more test details.

kcudnik commented 7 months ago

Double join could be an explanation. More likely scenario is that m_taskThread is not joinable because after MdioIpcServer constructor startMdioThread() was not called or because startMdioThread() failed. @kenneth-arista can provide more test details.

Can you investigate which exactly scenario is that ?

kenneth-arista commented 7 months ago

Double join could be an explanation. More likely scenario is that m_taskThread is not joinable because after MdioIpcServer constructor startMdioThread() was not called or because startMdioThread() failed. @kenneth-arista can provide more test details.

Can you investigate which exactly scenario is that ?

Unfortunately, I don't have all the logs when the problem occurred before integrating Jiahua's patch. Prior to the patch, I would often seen several syncd core files after each full sonic-mgmt test runs. I suspect these crashes are triggered in tests involving reboot.

kenneth-arista commented 7 months ago

I recall that a syncd crash was observed once during sonic-mgmt override_config_table tests.

jiahua-wang commented 7 months ago

This patch will address both cases of double join and thread not starting properly.

rlhui commented 6 months ago

@jiahua-wang - please update the branch? thanks

xumia commented 6 months ago

/azp run

azure-pipelines[bot] commented 6 months ago
Azure Pipelines successfully started running 1 pipeline(s).
jiahua-wang commented 6 months ago

/azp run

azure-pipelines[bot] commented 6 months ago
Commenter does not have sufficient privileges for PR 1342 in repo sonic-net/sonic-sairedis
jiahua-wang commented 6 months ago

/AzurePipelines run Azure.sonic-sairedis

azure-pipelines[bot] commented 6 months ago
Commenter does not have sufficient privileges for PR 1342 in repo sonic-net/sonic-sairedis
judyjoseph commented 6 months ago

@kcudnik Can we merge this PR

judyjoseph commented 6 months ago

/AzurePipelines run Azure.sonic-sairedis

azure-pipelines[bot] commented 6 months ago
Azure Pipelines successfully started running 1 pipeline(s).
jiahua-wang commented 6 months ago

@kcudnik @rlhui @judyjoseph All checks were passed when I created the PR for the first time. After I merged other commits, 2 checks always fail at the place not related to my code change. Should I close this one and create another one with the same code change?

kcudnik commented 6 months ago

All passed do you want me to merge?

jiahua-wang commented 6 months ago

@kcudnik Please merge.