Open d-loret opened 1 year ago
I believe you are correct in identifying the potential issue and I agree with your fix to remove the check in the if
clause.
Out of curiosity, did you encounter this as a read issue in the wild or just find it via code inspection?
I found it through code inspection.
But I later saw an issue in EELS where the code was stuck in JSD_EGD_FAULT_UNKNOWN
. We had to take down the node in order to get out of the weird state. Even though I didn't look into it in detail, it seemed very much like this issue.
The EGD driver can end up locked in the
JSD_EGD_STATE_MACHINE_STATE_FAULT
if the following sequence of events happens:JSD_EGD_STATE_MACHINE_STATE_FAULT
.state->pub.fault_code
is set toJSD_EGD_FAULT_UNKNOWN
).JSD_EGD_STATE_MACHINE_STATE_FAULT
.state->pub.fault_code
is stillJSD_EGD_FAULT_UNKNOWN
at this point, the code will never execute the timeout since the if clause to act on the timeout requiresstate->pub.fault_code
to be different fromJSD_EGD_FAULT_UNKNOWN
.I think the solution is to remove the check of
state->pub.fault_code
in the if clause. It does not seem that check is actually needed for the proper functioning of the code.@alex-brinkman, can you corroborate the above reasoning or clarify why the check on
state->pub.fault_code
is done?