nasa-jpl / jsd

Just SOEM Drivers
Other
18 stars 15 forks source link

EGD driver can potentially be locked in JSD_EGD_STATE_MACHINE_STATE_FAULT #65

Open d-loret opened 1 year ago

d-loret commented 1 year ago

The EGD driver can end up locked in the JSD_EGD_STATE_MACHINE_STATE_FAULT if the following sequence of events happens:

  1. A fault occurs in the EGD. State transitions to JSD_EGD_STATE_MACHINE_STATE_FAULT.
  2. The timeout to retrieve the error expires (i.e. state->pub.fault_code is set to JSD_EGD_FAULT_UNKNOWN).
  3. Driver transitions out of JSD_EGD_STATE_MACHINE_STATE_FAULT.
  4. Before a reset is issued, another fault occurs and cannot be retrieved either.
  5. Because state->pub.fault_code is still JSD_EGD_FAULT_UNKNOWN at this point, the code will never execute the timeout since the if clause to act on the timeout requires state->pub.fault_code to be different from JSD_EGD_FAULT_UNKNOWN.

I think the solution is to remove the check of state->pub.fault_code in the if clause. It does not seem that check is actually needed for the proper functioning of the code.

@alex-brinkman, can you corroborate the above reasoning or clarify why the check on state->pub.fault_code is done?

alex-brinkman commented 1 year ago

I believe you are correct in identifying the potential issue and I agree with your fix to remove the check in the if clause.

Out of curiosity, did you encounter this as a read issue in the wild or just find it via code inspection?

d-loret commented 1 year ago

I found it through code inspection.

But I later saw an issue in EELS where the code was stuck in JSD_EGD_FAULT_UNKNOWN. We had to take down the node in order to get out of the weird state. Even though I didn't look into it in detail, it seemed very much like this issue.