Open jundayin opened 2 years ago
tagging @cgillum for SQL Provider owner and @mws19901118
You might be fighting with the system if you try to use LockExpiration
for pausing since the system will be constantly trying to lock (and unlock) these instances anyways.
One intention I had for supporting this was to introduce a 'Suspended'
runtime status in the database. We actually already check for this status value (which isn't official yet) in at least one place in our stored procedures. The idea is that it will prevent the orchestration from getting picked up and executed by any workers until it's put back into the "Running"
status. You may want to play with this approach instead of messing with LockExpiration
.
BTW, if you find something that works, we'd love to possibly accept it as a contribution to this project. :) Perhaps there could be a stored procedure for pausing/suspending and another for resuming a paused/suspended instance? If you have a generic way of migrating an instance as well, that may also be very interesting to this project.
FYI @davidmrdavid since we're investigating a feature like this (suspend/resume) for the other storage backends as well.
Thanks for the advice and I will take a look at it
We actually did some PoC from our side. From what we observed the current logic of DTF does not block worker scheduling new sub-orchestration/activity when we change the runtime status of one Running
orchestration instance to Suspended
. Is this behavior by design or we are experimenting with wrong approach (we didn't add timer between tasks inside orchestration but just simply set breakpoint there, I assume worker will continue process the activity regardless, right?)
Say Orchestration Test
. It contains activity A and activity B
Test
Test
to Suspended
This matches my expectation and is the same outcome I'd expect if you terminate an orchestration - changing the status of an orchestration instance doesn't have any effect on any already scheduled activity tasks.
If we want to expand the scope of "Suspended" to also halt any previously scheduled activity tasks, then we'd need to update the dt._LockNextTask
stored procedure to do a join on the dt.Instances
table to check for a non-suspended status for the orchestration instance that scheduled the task (and verify that this change doesn't introduce any new deadlocks or substantial performance degradation).
Hi,
Our team is implementing a solution for migrating in-progress orchestration instance from one database to another one and it can be resumed and picked up by the worker after. We believe it would be better to pause the entire orchestration instance before running the SQL command moving the data.
The definition of pausing the migration is quite simple: The state of the orchestration instance will remained still and the worker will not pick it up until we allow it to.
There is no official API available by the SQL Provider for this type of "pause" operation thus we are looking to existing schema and see if we can utilize any of them. For now,
LockExpiration
in[dt].[Instances]
and[dt].[NewTasks]
and theVisibleTime
in[dt].[NewEvents]
seems strong candidates to logically pause the orchestration. We have some PoC that settingLockExpiration
andVisibleTime
forNewEvents
andNewTasks
to some far time could keep the worker from picking them up, but we have not tested if it works forInstances
as well.So my question to the owner of SQL Provider is does this approach lead to any unexpected behavior for worker and do you have any suggestion on the migration?