paed01 / bpmn-elements

Executable workflow elements based on BPMN 2.0
MIT License
129 stars 26 forks source link

Performance issues with bigger bpmn files #42

Open hvlwork opened 3 months ago

hvlwork commented 3 months ago

We are currently experiencing performance issues when it comes to transitions between serviceTasks (after the "next" function of the service is called, it takes quite long until the next serviceTask becomes active) and ending a process (from the point on where an "end event" is reached in the process, until the 'end' event of the bpmnElements.Definition is triggered). I have found out that this problem only occurs when the bpmn file is bigger. The bpmn file I tested it with also includes some loops, not sure if that can be a factor. Do you have any idea what could cause this issue or how we could try to fix it? Any suggestion would be highly appreciated.

paed01 commented 3 months ago

A couple of things come to mind:

hvlwork commented 3 months ago

Thank you for your quick response.

paed01 commented 3 months ago

Can you give some stats about the process, number of tasks, subprocesses, events, sequence flows, etc?

hvlwork commented 3 months ago

of course:

paed01 commented 2 months ago

What is the execution time? or what is slow in your implementation?

Your process seem rather reasonable, or not even large at all. I have a client with a process of >30 tasks/services, and with gateways, subprocess, etc. It runs in less than 200ms.

hvlwork commented 2 months ago

We will run some further tests and afterwards we will get back to you.

hvlwork commented 2 months ago

For example ending a process in the current state of our bpmn takes around 2-3 seconds. We found out that reducing the number of sequenceFlows with conditional expressions between the following two serviceTasks(after the one where the process is ended) to one each reduces the needed time to 0,3 to 0,7 seconds. So it seams like the conditional sequenceFlows are an important factor. (we often have several conditional sequence flows between the same two serviceTasks, not sure if that can be important)

What has to be considered here is that these times were measured on a quite powerful laptop, but we are developing our application for quite weak hardware. There it can become really painfully slow.

paed01 commented 2 months ago

issue-42-discard

Which of the three sequences is similar to your diagram?

hvlwork commented 2 months ago

image that is the bpmn file that I am talking about

paed01 commented 2 months ago

You will have a massive amount of discard loops with this design.

Each taken or discarded sequence flow will trigger the next task execution. Multiple outbound sequence flows is basically a parallel split. Hence, a discarded flow will discard all subsequent task outbound flows. Since you have multiple loopbacks, the discard sequence will continue until it reaches the first discarded sequence flow again - discard loop detected.

paed01 commented 2 months ago

If the end-event is really the end of execution, what if you make it a terminate end event? Then all other element will be stopped.

rakaposhi commented 2 months ago

So you would recommend that the beginning of the process looks like this? image

paed01 commented 2 months ago

I guess that would speed up execution. The terminate end event will stop all sequence flows and outstanding tasks (if any).

Did you notice any difference in execution time?

rakaposhi commented 2 months ago

Yes. It's faster now. Thanks. But we still have to do some further testing if the process is now fast enough.

hvlwork commented 2 months ago

As it turns out ending the process takes half it's time now - it is still not as fast as we need it to be but it is a noticeable improvement - thanks again 🙂 👍 The thing is that in case Flow 2 is taken and we still have more than one sequenceFlow leading from Task 2 to Task 3 and from Task 3 to Task 4, the transition is still as slow as before - no measurable performance improvement there If we make sure that there is always just one Flow going from one Task to another, then it is really fast - but it would mean a big limitation to our application Can you think of any alternative?

paed01 commented 2 months ago

What is the reason behind having multiple conditional sequence flows between the tasks? Logging? Logic?

NB! A task that cannot take any outbound conditional flows will throw ActivityError.

hvlwork commented 2 months ago

With our application we want to enable the user to define a process with a given set of tasks and every task has a given set of sequence flows. So sometimes a user wants several sequence flows to lead to the same task and sometimes every sequence flow should lead to a different task(separate handling of cases), that should be up to the user.

hvlwork commented 2 months ago

We have had an interesting finding. The order of the conditional sequence flows of a task in the xml (the options after a task to proceed) have a huge impact on the performance. We are talking about 100 to 200 faster if the option that should be taken is the first one in the xml. So for example if between Task 2 and Task 3 Flow 6 is taken and it is defined after Flow 5 in xml, it is way slower than if Flow 6 would be defined before Flow 5.

hvlwork commented 2 months ago

Can you reproduce this issue?

hvlwork commented 2 months ago

so assuming that Flow 6 is used it is really fast with this bpmn:

    <sequenceFlow id="Flow_0sv2vaj" name="Flow 6" sourceRef="Activity_0pec7dd" targetRef="Activity_1swl04q">
      <conditionExpression xsi:type="tFormalExpression">${environment.variables.flow6}</conditionExpression>
    </sequenceFlow>
    <sequenceFlow id="Flow_1fegb8x" name="Flow 5" sourceRef="Activity_0pec7dd" targetRef="Activity_1swl04q">
      <conditionExpression xsi:type="tFormalExpression">${environment.variables.flow5}</conditionExpression>
    </sequenceFlow>    
    <serviceTask id="Activity_1swl04q" name="Task3" implementation="${environment.services.task3}">
      <incoming>Flow_1fegb8x</incoming>
      <incoming>Flow_0sv2vaj</incoming>
      <outgoing>Flow_0t5bjaq</outgoing>
      <outgoing>Flow_0ts20th</outgoing>
      <outgoing>Flow_1gc65aj</outgoing>
    </serviceTask>

but by just switching the order of the flows in the bpmn like this, following Flow 6 already becomes like *100 slower:

   <sequenceFlow id="Flow_1fegb8x" name="Flow 5" sourceRef="Activity_0pec7dd" targetRef="Activity_1swl04q">
      <conditionExpression xsi:type="tFormalExpression">${environment.variables.flow5}</conditionExpression>
    </sequenceFlow>
    <sequenceFlow id="Flow_0sv2vaj" name="Flow 6" sourceRef="Activity_0pec7dd" targetRef="Activity_1swl04q">
      <conditionExpression xsi:type="tFormalExpression">${environment.variables.flow6}</conditionExpression>
    </sequenceFlow>
    <serviceTask id="Activity_1swl04q" name="Task3" implementation="${environment.services.task3}">
      <incoming>Flow_1fegb8x</incoming>
      <incoming>Flow_0sv2vaj</incoming>
      <outgoing>Flow_0t5bjaq</outgoing>
      <outgoing>Flow_0ts20th</outgoing>
      <outgoing>Flow_1gc65aj</outgoing>
    </serviceTask>
hvlwork commented 2 months ago

I stumbled over another performance issue - in case a sequenceFlow is immediately triggered when a Task becomes active, the sequenceFlow is pretty slow - if the sequenceFlow is triggered with at least a bit of a timeout it is fast again

paed01 commented 2 months ago

I haven't encountered designs with multiple sequence flows to the same target. I have to think how to handle that.

paed01 commented 1 month ago

Can you test again with npm i bpmn-elements@rc (v16.2.0)?

It should be a little better, but no promises.

hvlwork commented 1 month ago

it does not look like it improved a lot

paed01 commented 1 month ago

Sorry about that. The complexity of a flow will have a performance impact, as in programming in general. Loopback is a fantastic thing but comes with a price.

Instead of loopback, could you end execution and start a new execution immediately after?

Or attempt to join as many sequences as possible before doing the loopback:

issue-42-example

hvlwork commented 1 month ago

Ending the loop will not be an option for us as we want to be able to define a process all together in the bpmn file. Also restarting a process would mean for us that we have to share information between process runs which we would like to avoid.

I am not sure if I understand your second suggestion completely - the thing is that we do not want to execute tasks in parallel, we always want to execute one task, follow the sequence flow with the fitting condition and then execute the next task - I think with a parallel gateway the process would never finish as this gateway waits for token from all incoming flows

paed01 commented 1 month ago

In the example above, that is a subset of your diagram, the parallel join gateway will wait for all taken/discarded sequence flow before continuing. Hence, the parallel join gateway outbound - loopback - sequence flow will be taken once. Or if all inbound sequence flows are discarded, discarded once. The effect will be that Task 1 is not bothered more than necessary. Just a thought.