temporalio / sdk-core

Core Temporal SDK that can be used as a base for language specific Temporal SDKs
MIT License
262 stars 70 forks source link

[Bug] Cancellation when using local activities #705

Open Irvenae opened 5 months ago

Irvenae commented 5 months ago

What are you really trying to do?

In a workflow I am running a local activity. When I receive a signal I want to cancel the activity which runs in a cancellable scope and start a new activity.

Describe the bug

In some situations when the signal is received I get a

Fatal(\"Invalid transition while attempting to cancel LocalActivityMachine in MarkerCommandCreated\")

on which the workflow is stuck. This happens I think when a signal is received on the worker from the server while my local activity in the worker is finished before I handle this signal. The signal then tries to cancel the scope while it is already in a completed state resulting in an incorrect state machine?

I don't know how the state machine exactly works when having local activities. I thought that when you run local activities signals were not handled until the local activity is done. So maybe a question is then why the LocalActivityMachine cancelled?

Screenshot 2024-03-14 at 09 04 01

Here local activity markers where received by the server in some other situations the local marker is not there probably this depends on the timing between local activity / workflow task and signal.

Minimal Reproduction

I tried to reproduce this but because this is a race condition I did not manage to reproduce this in a simple example...

Environment/Versions

Linux x86_64 AMD EPYC 7B12 Temporal Server Version | 1.22.0 Temporal UI Version | 2.16.2 Temporal TS SDK 1.9.1 Kubernetes

Sushisource commented 5 months ago

Hey @Irvenae - thanks for opening. If you have code that even sometimes repros this, that would be useful.

Also, please try out the newest TS SDK, it has an updated Core that might address this

Irvenae commented 5 months ago

Ok, I missed that I was behind 😊 Seems like the last patch update has some fixes which might resolve this ^^ Unfortunately, I can't share this code. I will make a simplified case and try to randomly load it to see if I can reproduce.