pytransitions / transitions

A lightweight, object-oriented finite state machine implementation in Python with many extensions
MIT License
5.67k stars 529 forks source link

3.12 Performance #648

Closed idatsy closed 4 months ago

idatsy commented 6 months ago

Describe the bug Significantly worse performance on python 3.12

Minimal working example Need to line profile in more detail but appears that Machine creation is about 5x slower in my use-case

Additional context Highly anecdotal at the moment but figured I'd see if anyone else is having a similar issue. For my use-case its probably equally expensive to re-write than to debug.

aleneum commented 6 months ago

Hello @idatsy,

thanks for the info. If you have a MWE for reproduction let me know.

philipbjorge commented 5 months ago

Update

The below report was related to usage of transitions on many models. https://github.com/pytransitions/transitions/issues/146#issuecomment-895107123

We didn't have a clean way to use a global state machine + add_model and remove_model so we've resolved the issue by removing the library.


We're seeing poor performance on python 3.11 -- I don't have data from earlier python versions to indicate whether this is a regression or not.

CleanShot 2024-04-25 at 11 06 00

Using...

        self._machine = Machine(
            model=self,
            states=EvaluationState,
            initial=self.state or EvaluationState.CREATED,
            transitions=self.transitions(),
            send_event=True,
            auto_transitions=False,
        )

With 13 state transitions.

aleneum commented 4 months ago

Significantly worse performance on python 3.12

Unfortunately,

I cannot reproduce @idatsy's observation. Machine initialisation got slightly faster for Python 3.11 and 3.12.

image

There does not seem to be a significant difference between different versions of transition:

image

There seems to be some increase in latency when auto_transitions=True but its nowhere near 400% (more like 20%). Single transitions (trans=True) do not influence the initialisation time much. When trans=True, one state transition is added with the trigger 'go' from state<X> to state<X+1> (N - 1 in total). But as I said, their influence is neglectable, especially compared to auto_transitions that add 'N - 1' transitions to each state. The transition version order for each bar group is (0.7.2, 0.8.0, 0.8.11, 0.9.0).

image

I will test whether there has been regression for transition executions and multiple models. But leads would be highly appreciated.


This is my test code (tested on a MacBook M1 2021 with Conda)

from transitions import Machine, __version__

state_nr = [1, 5, 10, 20, 50, 75, 100]
trials = 5
n_runs = 2000 // trials

state_dict = {n: [f"State_{i}" for i in range(n)] for n in state_nr}
trans_dict = {n: [["go", a, b] for a, b in zip(state_dict[n][:-1], state_dict[n][1:])] for n in state_nr}

def test_init_machine(n, with_auto, with_trans):
    for _ in range(n_runs):
        _ = Machine(states=state_dict[n], transitions=trans_dict[n] if with_trans else None, initial=state_dict[n][0], auto_transitions=with_auto)

if __name__ == '__main__':
    import timeit
    import sys
    pyver = f"{sys.version_info.major}.{sys.version_info.minor}"
    print(f"{pyver} : {__version__}")
    with open('init_results.csv', 'a') as f:
        for n in state_nr:
            for with_auto in [True, False]:
                for with_trans in [True, False]:
                    samples = [timeit.timeit(f'test_init_machine({n}, {with_auto}, {with_trans})', globals=globals(), number=1) * 1000 / n_runs for _ in range(trials)]
                    mean = sum(samples) / trials
                    stddev = (sum((x - mean) ** 2 for x in samples) / trials) ** 0.5
                    print(f'"{pyver}","{__version__}","{n}","{with_auto}","{with_trans}","{mean:.2f}","{stddev:.2f}"', file=f)
#!/bin/bash

envs=("test-3.12" "test-3.11" "test-3.10")
versions=("0.7.2" "0.8.0" "0.8.11" "0.9.0")

. /opt/homebrew/Caskroom/miniconda/base/etc/profile.d/conda.sh
for env in ${envs[@]}; do
    conda activate ${env};
    for version in ${versions[@]}; do
        echo "Testing transitions version ${version} in ${env}";
        pip install transitions=="${version}";
        python test_init.py;
    done;
    conda deactivate
done;
aleneum commented 4 months ago

Same graphs as above but added unreleased 0.9.1 version. Observations stay the same:

image

image

image

aleneum commented 4 months ago

Test results for 0.9.1 with varying versions of Python:

Python 3.11 and 3.12 seem to result in faster initialisation with 50 states and 50 models along the line:

Machine(model=[Model() for _ in range(m)], states=state_dict[n], transitions=trans_dict[n], initial=state_dict[n][0], auto_transitions=with_auto)

image

There is an increase in initialisation time starting from 0.9.0 with multiple models. I guess that this bump is caused by the added "may_" trigger and _add_may_transition_func_for_trigger. may_ is also added for auto_transitions. I'd assume this is not necessary and could be omitted. However, the increase is notable but does not double init time or worse:

image

And another test with varying amounts of models showing similar correlation between states (and thus auto transitions) and number of models:

image

aleneum commented 4 months ago

Cycling n models through n states (with one model callback passed as name) also implies that there was a speed improvement since Python 3.11:

machine = machines_dict[n]  # an already initialised machine with n states
states = state_dict[n]
while machine.models[0].state != states[-1]:
    machine.dispatch("go")
machine.dispatch("to_" + states[0])  # reset to initial state

image

A change in 0.8.0 resulted in some overhead though:

image

aleneum commented 4 months ago

Update: Some division issue distorted the calculated times. In the previous diagram y-axis ranged up to 4000ms. THAT would have been a hefty performance hit. With this fixed times are now similar to string based states yet slightly worse.

Similar picture for enum states and adding, removing models:

state_dict = {n: Enum("StateEnum", {f"State{i}": i for i in range(n)}) for n in state_nr}

machines_dict = {
    n: Machine(
        model=None,
        states=state_dict[n], initial=state_dict[n].State0,
        after_state_change="callback",
        auto_transitions=False,
        ordered_transitions=True
    )
    for n in state_nr
}

def test_enum(n):
    machine = machines_dict[n]
    stack = []
    for _ in range(n):
        model = Model()
        machine.add_model(model)
        stack.append(model)
        machine.dispatch("next_state")
        while model.state != state_dict[n].State0:
            machine.dispatch("next_state")
    while stack:
        machine.remove_model(stack.pop())

Machines are already initialised but doubling n means twice the models to initialise, to add (and thus to decorate) and to remove (and un-decorate) and states to cycle.

Python 3.10 seems to be ~slightly~ notably slower than 3.11 and 3.12

image

A doubling in models and states from 50 to 100 increases the run time from 500ms to 2500ms, an increase of 400%. But this increase does not look suspicious to me since every time a model is added, all already added models are cycled once and each cycle is twice the length.

image

aleneum commented 4 months ago

Last check related to @philipbjorge's observation:

I checked a decorated and undecorated version of:

  1. adding n models
  2. cycling n models through 50 states
  3. deleting n models

with preinitialised machine and states:

states = Enum("StateEnum", {f"State{i}": i for i in range(50)})

machine = Machine(
        model=None,
        states=states, initial=states.State0,
        after_state_change="callback",
        auto_transitions=False,
        ordered_transitions=True
    )
decorated undecorated
```python class Model: def callback(self): pass # ... for _ in range(n): model = Model() machine.add_model(model) machine.dispatch("next_state") while model.state != states.State0: machine.dispatch("next_state") while machine.models: machine.remove_model(machine.models[0]) ``` ```python class Model: def __init__(self, machine) -> None: self.machine = machine self.state = self.machine.initial self.machine.models.append(self) def trigger(self, trigger_name, *args, **kwargs): return self.machine.events[trigger_name]\ .trigger(self, *args, **kwargs) def callback(self): pass # ... for _ in range(n): model = Model(machine) for mod in machine.models: mod.trigger("next_state") while model.state != states.State0: for mod in machine.models: mod.trigger("next_state") while machine.models: _ = machine.models.pop() ```

Adding 1000 models, cycling through 50 states and removing models 'the standard way' takes about 350ms (ob my Macbook) while the undecorated version derived from the mentioned comment is roughly 30% faster. I could not see other differences than the one observed before between Python or transitions versions. However, data shows that the 0.7/0.8 performance gap is independent of model decoration though.

image

Summary: Even though this 'benchmark' was rather 'qualitative' than 'accurate', I'd assume that drastic changes in performance should have been observable. Let me know if I overlooked something or screwed up somewhere on the way. As of now, I could not find performance issues as described by @idatsy and @philipbjorge. If anyone else also faces such performance issues and could provide some leads, that would be much appreciated.

aleneum commented 4 months ago

For the record (and/or masochists) I add my test scripts:

transitions-benchmark.zip

aleneum commented 4 months ago

I will close this issue since there has not been any feedback in the last three weeks and I could not reproduce the observed performance issues. Feel free to comment nevertheless in case you face performance issues and you can pinpoint this to some changes in transitions. I will reopen the issue if required.