spring-projects / spring-statemachine

Spring Statemachine is a framework for application developers to use state machine concepts with Spring.
1.53k stars 599 forks source link

DefaultStateMachineService does not restore statemachine properly with StateMachinePersist and RuntimePersister #1017

Closed zivyou closed 2 years ago

zivyou commented 2 years ago

Hi: I config a EnumStateMachine with StateMachinePersist and RuntimePersister enabled. The config process may like this:

public class MyStateMachineConfig extends EnumStateMachineConfigurerAdapter<States, Events> {
    @Autowired
    private StateMachineRuntimePersister<States, Events, String> myFsmRedisPersister;

    @SuppressWarnings("SpringJavaInjectionPointsAutowiringInspection")
    @Bean
    public DefaultStateMachineService<States, Events> defaultStateMachineService(StateMachineFactory<States, Events> stateMachineFactory,
                                                                                 MyFsmRedisPersist redisPersist
    ) {
        return new DefaultStateMachineService<States, Events>(stateMachineFactory, redisPersist);
    }

    @Override
    public void configure(StateMachineConfigurationConfigurer<States, Events> config)
        throws Exception {
        config.withConfiguration()
            .autoStartup(true)
            .listener(listener())
            .and()
            .withPersistence()
            .runtimePersister(myFsmRedisPersister);
    }
}

With debugging the project, I think the project works like this:

  1. whenever state transition happens, it will call myFsmRedisPersister.write() to persist state machine context into Redis;
  2. DefaultStateMachineService will call redisPersist.read() to load state machine context from Redis when necessary;

But when call DefaultStateMachineService.acquireStateMachine(), the problem happens.

    @Override
    public StateMachine<S, E> acquireStateMachine(String machineId, boolean start) {
        log.info("Acquiring machine with id " + machineId);
        StateMachine<S, E> stateMachine;
        // naive sync to handle concurrency with release
        synchronized (machines) {
            stateMachine = machines.get(machineId);
            if (stateMachine == null) {
                log.info("Getting new machine from factory with id " + machineId);
                stateMachine = stateMachineFactory.getStateMachine(machineId);
                if (stateMachinePersist != null) {
                    try {
                        StateMachineContext<S, E> stateMachineContext = stateMachinePersist.read(machineId);
                        stateMachine = restoreStateMachine(stateMachine, stateMachineContext);
                    } catch (Exception e) {
                        log.error("Error handling context", e);
                        throw new StateMachineException("Unable to read context from store", e);
                    }
                }
                machines.put(machineId, stateMachine);
            }
        }
        // handle start outside of sync as it might take some time and would block other machines acquire
        return handleStart(stateMachine, start);
    }

As we can see, when stateMachine == null, we firstly try to get a stateMachine by stateMachineFactory.getStateMachine(machineId); then try to replace the stateMachine object with then context loaded from stateMachinePersist. The key point is that, when we call stateMachineFactory.getStateMachine(machineId), it comes with state transition and will write data back into Redis. That means the following codes which try to load state machine context will get un-excepted result. I am not sure if it is a design contrain of FSM theory, or just a code issue. Anyway, I am going to extend DefaultStateMachineService class to override these codes.

Thanks.

zivyou commented 2 years ago

After further debugging, I got the real reason. Since I configured the state machine with ".autoStartup(true)" enable, when stateMachineFactory.getStateMachine(machineId) was called, state transition will be triggered and the data will be wrote back to Redis. That is why I got the wrong state machine context. So, it is neither a design constrain of FSM theory, nor a code issue. It is just a configuration issue.