sartography / spiff-arena

SpiffWorkflow is a software development platform for building, running, and monitoring executable diagrams
https://www.spiffworkflow.org/
GNU Lesser General Public License v2.1
63 stars 42 forks source link

dev.mod - maximum recursion depth exceeded error #1530

Open madhurrya opened 4 months ago

madhurrya commented 4 months ago

Noticed the 'maximum recursion depth exceeded' error in these instances

https://dev.mod.spiff.status.im/i/37138 https://dev.mod.spiff.status.im/i/36091 https://dev.mod.spiff.status.im/i/36229

these instances are quite old. there is a nonzero possibility that a fix occurred in the interim, though we don't have a specific hypothesis about what that might have been.

stack trace from one of the errors:

maximum recursion depth exceeded

Stacktrace:
RecursionError: maximum recursion depth exceeded
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    return self.workflow.tasks.get(self._parent)
  File "/app/venv/lib/python3.12/site-packages/SpiffWorkflow/task.py", line 114, in parent
       ^^^^^^^^^^^
    if self.parent is None:
  File "/app/venv/lib/python3.12/site-packages/SpiffWorkflow/task.py", line 194, in is_descendant_of
  [Previous line repeated 948 more times]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    return self.parent.is_descendant_of(task)
  File "/app/venv/lib/python3.12/site-packages/SpiffWorkflow/task.py", line 198, in is_descendant_of
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    return self.parent.is_descendant_of(task)
  File "/app/venv/lib/python3.12/site-packages/SpiffWorkflow/task.py", line 198, in is_descendant_of
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    return self.parent.is_descendant_of(task)
  File "/app/venv/lib/python3.12/site-packages/SpiffWorkflow/task.py", line 198, in is_descendant_of
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    if self.split_task and task.is_descendant_of(my_task):
  File "/app/venv/lib/python3.12/site-packages/SpiffWorkflow/specs/Join.py", line 214, in _find_tasks
                ^^^^^^^^^^^^^^^^^^^^^^^^^
    for task in self._find_tasks(my_task):
  File "/app/venv/lib/python3.12/site-packages/SpiffWorkflow/specs/Join.py", line 223, in _do_join
    self._do_join(my_task)
  File "/app/venv/lib/python3.12/site-packages/SpiffWorkflow/specs/Join.py", line 198, in _update_hook
       ^^^^^^^^^^^^^^^^^^^^^^^^^^
    if self._update_hook(my_task):
  File "/app/venv/lib/python3.12/site-packages/SpiffWorkflow/specs/base.py", line 261, in _update
    child.task_spec._update(child)
  File "/app/venv/lib/python3.12/site-packages/SpiffWorkflow/specs/base.py", line 372, in _on_complete
    self.task_spec._on_complete(self)
  File "/app/venv/lib/python3.12/site-packages/SpiffWorkflow/task.py", line 376, in complete
    self.complete()
  File "/app/venv/lib/python3.12/site-packages/SpiffWorkflow/task.py", line 360, in run
    spiff_task.run()
  File "/app/src/spiffworkflow_backend/services/workflow_execution_service.py", line 146, in spiff_run
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    task_runnability = self.execution_strategy.spiff_run(
  File "/app/src/spiffworkflow_backend/services/workflow_execution_service.py", line 447, in run_and_save
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    task_runnability = execution_service.run_and_save(exit_at, save)
  File "/app/src/spiffworkflow_backend/services/process_instance_processor.py", line 1545, in _do_engine_steps
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    return self._do_engine_steps(exit_at, save, execution_strategy_name, execution_strategy)
  File "/app/src/spiffworkflow_backend/services/process_instance_processor.py", line 1498, in do_engine_steps
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^
    task_runnability = processor.do_engine_steps(
  File "/app/src/spiffworkflow_backend/services/process_instance_service.py", line 304, in run_process_instance_with_processor
    ProcessInstanceService.run_process_instance_with_processor(
  File "/app/src/spiffworkflow_backend/background_processing/celery_tasks/process_instance_task.py", line 42, in celery_task_process_instance_run
    yield
  File "/app/src/spiffworkflow_backend/services/process_instance_queue_service.py", line 133, in dequeued
Traceback (most recent call last):

UPDATE: elizabeth pointed out that if you have two sequence flows pointing to the same task (where you should have an exclusive gateway), this can happen. https://github.com/sartography/SpiffWorkflow/issues/394 is related.

madhurrya commented 2 months ago

Noticing this again https://dev.mod.spiff.status.im/extensions/recent-error-events image

jasquat commented 2 months ago

This seems to be related to the comment here https://github.com/sartography/spiff-arena/issues/1861#issuecomment-2206467374.

madhurrya commented 2 months ago

@jasquat above error is in this model. What exactly needs to change there? https://dev.mod.spiff.status.im/process-models/manage-talents:talent-acquisition-from-job-requisition-to-hiring:job-requisition:request-new-role

jasquat commented 2 months ago

@madhurrya I'm not sure. My comment was based on what @essweine was saying in slack. This issue could be unrelated and could actually be caused by too many tasks over time and eventually causes the recursion in spiff to topple over.

Unfortunately the diagram for the process instances in question no longer exists so we can't check it but the instances are quite old with a lot of tasks - I saw 22,000+ events on one. I think the main takeaway is to make sure tasks do not have multiple inputs to them going forward.

Also maybe one of the new changes in SpiffWorkflow added a recursion path that did not exist before?

essweine commented 2 months ago

Looping back directly to a task without an intervening exclusive gateway often causes this error. As for volume of tasks, the relevant metric is the depth in the tree of a particular task (though of course typically the probability of a deeply nested task goes up as the number of tasks increases). If you can get a serialized version of one of the failing processes, I can take a look at it.

jasquat commented 2 months ago

I tried to get the serialized json but the process I was running - poetry run python bin/get_bpmn_json_for_process_instance.py 35667 - was killed. I'm not sure we can export the instance in its entirety.