sartography / spiff-arena

SpiffWorkflow is a software development platform for building, running, and monitoring executable diagrams
https://www.spiffworkflow.org/
GNU Lesser General Public License v2.1
48 stars 36 forks source link

Parallel gateway and missing task data #1502

Open sashayar13 opened 3 weeks ago

sashayar13 commented 3 weeks ago

There was an issue in prod where task data didn’t go through a merging parallel gateway.

The currency_enum data was produced inside a call activity Get currency list but was not available after the merging parallel gateway. You can see more details in the attached screenshots (https://www.notion.so/Parallel-gateway-and-missing-task-data-b55d50b7d29c422f95de834da9726052?pvs=4)

jasquat commented 3 weeks ago

@madhurrya is this an issue on test.mod as well? I think this is fixed on dev.app and test.app.

madhurrya commented 3 weeks ago

I have not noticed it in any environment other than in dev.mod. Do you think the fix you did for the dev.mod issue will fix this as well? It's the same issue but in dev.mod it was consistently happening, but here it is Random and we never noticed it during our testing.

jasquat commented 3 weeks ago

Yeah that is odd. I do think the same fix would fix it. I'm not sure how this could happen without celery if it doesn't happen in non-prod. How often has this happened in prod?

madhurrya commented 3 weeks ago

I think just once so far. But it has been just one day since we deployed the latest to Prod.

calexh-sar commented 3 weeks ago

On hold pending additional occurrences and/or being able to reproduce.

sashayar13 commented 3 weeks ago

just reproduced this in prod again (and didn't notice this behaviour in test.app or dev.app) Task data from the parallel paths doesn't come through to the Compare task

Image

Got the error (also claiming that the task data didn't come through, however it should) in dev.app after the process modification to: https://dev.app.spiff.status.im/i/6889

Image

process model is https://dev.app.spiff.status.im/process-models/misc:data-sync:bbhr-data-sync:orgstructure-bbhr-iplicit

sashayar13 commented 3 weeks ago

the process without the parallel gateway worked just fine in prod - no errors

madhurrya commented 3 weeks ago

@sashayar13 I was trying to recreate this issue using a simple model and when creating it I have mistakenly added the closing gateway as a exclusive instead of parallel and I got the same error as you did. And later realized my mistake and when I checked your diagram above noticed that you have also done the same :-) So it must be the reason why you got it in dev.app image

When I changed my model correctly to parallel gateway, it works fine. https://dev.app.spiff.status.im/process-models/misc:qa:bpmn-model-testing:unit-test-parallel-gateway-2

What is the process that you get this error in Prod? Will try to create a similar simpler one to recreate it.

jasquat commented 3 weeks ago

Oh, I wonder if it's related to the manual tasks after the call activities / subprocesses. Was that first screenshot taken from prod or dev.app?

sashayar13 commented 3 weeks ago

@jasquat another occurance with the currency_enum list occured in prod https://prod.mod.spiff.status.im/i/6487 Upload expense evidence - travel process

sashayar13 commented 3 weeks ago

the process in dev - https://dev.app.spiff.status.im/process-models/manage-finance:accounts-payable:process-expense-reports:upload-expense-evidence-travel The process is started by a Message event

Something I noticed is that in both cases, in prod.mod - The issue appeared when I had to manually restart the Get Supplier details call activity because of the Iplicit issues. https://dev.app.spiff.status.im/editor/process-models/manage-finance:accounts-payable:process-expense-reports:capture-expense-data/files/capture-expense-details.bpmn The steps I follow to debug Iplicit related issues:

Image

burnettk commented 3 weeks ago

we created a process model with two call activities run in parallel, like the last screenshot with Get Xe Currencies Enum List and Get Supplier details. We put a service task inside the bottom one that was designed to fail. After it failed, we rewinded to the beginning of the bottom called activity and ran it again (making the service task pass this time). the task data generated from the top and bottom call activities was present at the parallel gateway and in subsequent tasks. so we failed to repro, alas.

sashayar13 commented 2 weeks ago

Yep, I also tried to reproduce it in test.app but failed

calexh-sar commented 2 weeks ago

Per discussion during Ticket Grooming meeting, parking until this can be reproduced.