Open Irvenae opened 5 months ago
Hey @Irvenae - thanks for opening. If you have code that even sometimes repros this, that would be useful.
Also, please try out the newest TS SDK, it has an updated Core that might address this
Ok, I missed that I was behind 😊 Seems like the last patch update has some fixes which might resolve this ^^ Unfortunately, I can't share this code. I will make a simplified case and try to randomly load it to see if I can reproduce.
What are you really trying to do?
In a workflow I am running a local activity. When I receive a signal I want to cancel the activity which runs in a cancellable scope and start a new activity.
Describe the bug
In some situations when the signal is received I get a
on which the workflow is stuck. This happens I think when a signal is received on the worker from the server while my local activity in the worker is finished before I handle this signal. The signal then tries to cancel the scope while it is already in a completed state resulting in an incorrect state machine?
I don't know how the state machine exactly works when having local activities. I thought that when you run local activities signals were not handled until the local activity is done. So maybe a question is then why the LocalActivityMachine cancelled?
Here local activity markers where received by the server in some other situations the local marker is not there probably this depends on the timing between local activity / workflow task and signal.
Minimal Reproduction
I tried to reproduce this but because this is a race condition I did not manage to reproduce this in a simple example...
Environment/Versions
Linux x86_64 AMD EPYC 7B12 Temporal Server Version | 1.22.0 Temporal UI Version | 2.16.2 Temporal TS SDK 1.9.1 Kubernetes