sartography / SpiffWorkflow

A powerful workflow engine implemented in pure Python
GNU Lesser General Public License v3.0
1.69k stars 313 forks source link

Access to Multi Instance task index from task in v2.0.0rc0 with bpmn:loopCardinality #337

Closed asauve2 closed 1 year ago

asauve2 commented 1 year ago

The new Multi Instance code has fixed the task tree issue we found in #331 which is good news!

However we rely on the standard bpmn:loopCardinality attribute in our custom data model and the current task index which is expected to be >= 0 and < loopCardinality does not seem to be accessible from the Task data.

The test ScriptTask used is parsed using

and the spec inherit from

Closer inspection to bpmn/specs/mixins/multiinstance_task.py
does not show any hints about that index being passed. Runtime Task data is also empty and no attributes seem to match this value.

Given that the focus has probably be with good reasons set on DMN handling, is this parameter just missing (and then could it be added?) or have I just overlooked something?

essweine commented 1 year ago

The primary reason for not including the loop counter was to not add a variable that could potentially overwrite something that already exists in the task data. Many of our workflows have a large number of variable and it's already extremely difficult to manage them. We've been bitten in the past with variables unexpectedly getting overwritten.

So if I added it there, I would probably add another field in the bpmn.js panel to allow a user to specify the name of the variable to store the counter in (though I'm not sure that would help you, since I'm not sure you're using any of our extensions).

Spiff actually does keep track of keys and indexes for multiinstance tasks in the task internal data (it's needed to manage the child tasks). It would be easy to add something similar to the task internal data for loop tasks too. Another alternative would be to make the information available through a workflow method (which could return the overall state of all the tasks and would probably be generally useful).

Adding the counter to the internal data is probably the path of least resistance.

asauve2 commented 1 year ago

I totally agree that the loop counter does not belong to Task.data as implicit variables lead generally to subtle debug issues. I was rather expecting it to be present as one of the Task object attributes where it seem intuitively to belong given that these tasks are initiated from the parent multi instance.

You are right that currently we are not using any extensions (except a custom class attribute for Service Task), as we want to loosen dependencies to editors (or be compatible with at least camunda and spiff arena). So the alternate way to get the loop counter would be from API internal state. I tend to think that both access are useful: workflow based named-on-demand variable and API internal state. Our goal is still to implement threaded Parallel Multi Task, but I dont want currently to introduce base library modification, so I'll wait for this part of the library to stabilize while moving to enabling threading of parallel Gateway as first step.

The idea of getting overall state from workflow has some usefull use cases but would be $\mathcal{O}(ntask)$ or $\mathcal{O}(log(ntask))$ using a dict for a specific task access. Still it would be an advantage if we keep a consistent $O(1)$ access to this information.

asauve2 commented 1 year ago

I realized that a graphical support is always welcome in these discussions so here it is :-) multiple_parallel_instance_loop

In the case dicussed above the loop functionality will be used with the loop counter, but also in the context of threaded parallelism following discussion in #336.

One cannot stress enough that those simple graphical representations can lead to very complex implementation issues under the hood. I suppose that to keep all the multi instance options manageable, the loop counter and the key based (DMN | workflow.data collection) should all have a unified API representation even if we end with some duplication like key=value=loop_index for example.

essweine commented 1 year ago

I apologize for the delay in responding to this. I've been pretty busy for the past week trying to finish up something I expected to be done with last week. This is good feedback.

For multiinstance tasks (whether based on a collection or a cardinality), the index (if based on cardinality or a sequence collection type) or the key (if based on a mapping collection type in a variable called key_or_index in its internal data. So for that MI tasks, you can currently get this from task.internal_data.get('key_or_index'). I should have mentioned specifically how to get it earlier.

For loop tasks, it is not stored due to my own short-sightedness, but it could easily be added.

I agree that it would be useful to have both workflow and task based access to this information, methods for obtaining this information should be added to BpmnTaskSpecMixin (for information about specific tasks) and BpmnWorkflow (for information about the bigger picture).

essweine commented 1 year ago

This PR adds a task_info method to the task spec that will return specific information about MI tasks. For the MI task, it provides a map from the input item to the task id and lists of completed/running/future instances. https://github.com/sartography/SpiffWorkflow/pull/347