fix: initial commit - Githubissues

preston-rogers commented 1 year ago

It was found that if a fault occurs slightly after commands are sent from fastcat, it is possible that a halt could be sent before the JSD level has time to transition to its operational state.

Error scenario: Motion commands are sent and then a fault immediately occurs resulting in the immediate sending of the halt command.

We see in this graph that the actuator transitions from "1" (HALTED) to "9" (ACTUATOR_SMS_CS) to "0" (FAULT). Within the same cycle, the JSD level transitions from "35" (JSD_ELMO_STATE_MACHINE_STATE_SWITCHED_ON) to "39" (JSD_ELMO_STATE_MACHINE_STATE_OPERATION_ENABLED). It then proceeds to stay enabled perpetually running at the command set when the actuator state was in ACTUATOR_SMS_CS.

Why this occurs: If the halt command is processed before JSD has a chance to change to state "39" (JSD_ELMO_STATE_MACHINE_STATE_OPERATION_ENABLED), the halt command is simply ignored.

To explain why the graph shows the actuator state changing at the same time as the JSD Elmo state, we need to understand the order of operations when a single loop is carried out: Read stage - The current state of JSD is read from the gold drive Write stage - Commands are sent to JSD (such as CST) Process stage - The written commands are processed

After these three steps are taken care of in the cycle, the actuator states and egd/epd states are updated on the telemetry.

Keep in mind that within a single loop cycle (updates in the jsd read stage), the egd state actually updates before the actuator state updates.

So even though it appears as though the two states change simultaneously, truly JSD switches states first and then the actuator switches states.

Nevertheless, the important bit is the cycle before JSD switches to Operation enabled. In this cycle, JSD is in JSD_ELMO_STATE_MACHINE_STATE_SWITCHED_ON when Fault() is called on the actuator. After Fault() is called (carried out by ExecuteAllDeviceFaults()), the JSD Process() function is called (calling jsd_egd_process() and jsd_egd_process_state_machine()) which ignore the halt.

The fix: To fix this, we ensured a halt cannot be ignored by only changing new_halt_command to false after it is carried out. To avoid complications that could arise if a hardware-level fault occurs between calling the halt command and carrying it out, we also make sure to set this flag to false in JSD_ELMO_STATE_MACHINE_STATE_SWITCH_ON_DISABLED (the starting state).

Doing this produced the following behavior:

d-loret commented 1 year ago

@preston-rogers, do we have a solution for the edge case you mentioned last time? Sending a halt command while we are in SWITCHED_ON that is not done while making a transition between states.

preston-rogers commented 1 year ago

@preston-rogers, do we have a solution for the edge case you mentioned last time? Sending a halt command while we are in SWITCHED_ON that is not done while making a transition between states.

Hey Daniel, my current idea is to have it persistent for a fixed amount of time (rather than a fixed number of cycles). The amount of time it takes for a human to press a button is 100 milliseconds. So it is conceivable to believe that someone would not send a halt command (in the incorrect state) and then a true command within this amount of time.

We see from the graphs that it takes about 10 cycles for JSD to complete the transition back to the halt state (egd state machine), so I believe it is fair to say that given about 100 milliseconds, JSD should be able to change states and receive the halt command.

d-loret commented 1 year ago

@preston-rogers, I agree we should use time instead of number of cycles to keep the halt request on. However, thinking a little more about this, I think we might end up with a cleaner solution if we use a flag that indicates whether a transition to OPERATION_ENABLED has been requested. If the flag is on, we do not reset state->new_halt_command (i.e. set it to false at the end of jsd_epd_process_state_machine).

We can turn off the flag when we enter OPERATION_ENABLED or FAULT.

The obvious place to turn on the flag would be in SWITCHED_ON when state->new_reset is processed, but given that in theory we can send a reset request (enter enabled operation command) in SWITCH_ON_DISABLED and READY_TO_SWITCH_ON, and it will be latched until it is processed in SWITCHED_ON, we probably also want to turn on the aforementioned flag if we are in SWITCH_ON_DISABLED or READY_TO_SWITCH_ON and state->new_reset is True.

nasa-jpl / jsd

fix: initial commit #101