microsoft / durabletask-mssql

Microsoft SQL storage provider for Durable Functions and the Durable Task Framework
MIT License
87 stars 32 forks source link

Provide a way to correlate orchestrations with their suborchestrations #35

Closed scale-tone closed 3 years ago

scale-tone commented 3 years ago

To show fancy diagrams like this: image DurableFunctionsMonitor needs to be able to fetch ids of suborchestrations that were called by a given orchestration. DurableOrchestrationStatus.History doesn't provide that info, but fortunately the default storage provider stores those ids in XXXHistory table, and DfMon is able to fetch that info by doing a simple direct query against that table. MsSql storage provider doesn't do that, and there seems to be no way to fetch that info from the database.

Can you please add that data, so that execution history can be populated with child orchestration ids?

Ideally, of course, it would be great if DurableOrchestrationStatus.History had that field included (so that DfMon didn't need to make any custom roundtrips), but I guess that's a much longer story...

usemam commented 3 years ago

@scale-tone Looking into DurableFunctionsMonitor's source code, I see that it invokes DurableClient.GetStatusAsync which in its turn calls DurabilityProvider.GetOrchestrationStateWithInputsAsync. Currently MS-SQL's implementation of DurabilityProvider returns only single orchestration state wrapped into array:

public override async Task<IList<OrchestrationState>> GetOrchestrationStateWithInputsAsync(string instanceId, bool showInput = true)
{
    OrchestrationState? state = await this.service.GetOrchestrationStateAsync(instanceId, executionId: null);
    if (state == null)
    {
        return Array.Empty<OrchestrationState>();
    }

    if (!showInput)
    {
        // CONSIDER: It would be more efficient to not load the input at all from the data source.
        state.Input = null;
    }

    return new[] { state };
}

@cgillum Could you please share if current behavior is by design, or it could be enhanced to return not only requested orchestration's state, but also its sub-orchestration's?

cgillum commented 3 years ago

Hi @usemam thanks for the suggestion. I worry that returning state for both the target and sub-orchestrations would be problematic because there could be thousands of sub-orchestrations. It also changes the expected behavior of this existing API, which could be problematic.

That said, I think the base requirement is that we expose parent/child relationship information in the database, as @scale-tone suggested. I'll take a look at that first as a way to resolve this issue.

cgillum commented 3 years ago

@scale-tone The dt.Instances table has a ParentInstanceID column that can correlate parent and child orchestrations. Would that be sufficient for what you need? Mapping sub-orchestration instances to specific rows in the history table is a little more tricky, because it requires you to look at the dt.History.TaskID column for the sub-orchestration history's ExecutionStarted event and map that back to the TaskID for the parent's SubOrchestrationInstanceCreated event. If it would make things easier, I can probably add a TaskID column to the dt.Instances table which will give you a simpler way to do this history correlation.

DECLARE @InstanceID varchar(100) = '43674ea3df9b4379805f7a436952339f'

SELECT [InstanceID], [Name], [CreatedTime], [ParentInstanceID] FROM dt.Instances WHERE InstanceID = @InstanceID
SELECT [InstanceID], [EventType], [TaskID], [Name], [Timestamp] FROM dt.History WHERE InstanceID = @InstanceID

SELECT [InstanceID], [Name], [CreatedTime], [ParentInstanceID] FROM dt.Instances WHERE ParentInstanceID = @InstanceID
SELECT [InstanceID], [EventType], [TaskID], [Name], [Timestamp] FROM dt.History WHERE InstanceID = 'Sub1'

image

scale-tone commented 3 years ago

So, to confirm my understanding, the matching condition should then be

   (ChildOrchestration.ParentInstanceId === ParentOrchestration.InstanceId)
   &&
   (ChildOrchestration.History["ExecutionStarted"].TaskID === ParentOrchestration.History["SubOrchestrationInstanceCreated"].TaskID)

, correct? Then it's good enough already, no need to add anything, thanks!

Strategically, of course, it would be great if DurableOrchestrationStatus.History provided all this data by itself (including TaskScheduledId and what not). So that DfMon is relieved from workarounds...

cgillum commented 3 years ago

Yes, that's correct.

And I agree that it would be great if DurableOrchestrationStatus.History provided all the necessary data. That's a bigger item to tackle, as you mentioned...

cgillum commented 3 years ago

Then it's good enough already, no need to add anything, thanks!

Closing for now. We can reopen if we find this is really needed later.