microsoft / botbuilder-dotnet

Welcome to the Bot Framework SDK for .NET repository, which is the home for the libraries and packages that enable developers to build sophisticated bot applications using .NET.
https://github.com/Microsoft/botframework
MIT License
878 stars 484 forks source link

DialogState lost when resuming a Dialog invoked through a Custom Adapter #896

Closed darrenj closed 6 years ago

darrenj commented 6 years ago

I have a scenario whereby I'm invoking a Bot through a CustomAdapter, i.e. no ASP.NET Core activation to see here. I therefore have to setup the state myself and clearly missing a step. This issue didn't occur in the prior state implementation so potentially a regression or just missing a step.

I'm seeing some very odd behaviour around resuming a dialog after a prompt. When the dialog is resumed ActiveDialog is set to NULL meaning the Activity never makes it's way to the previously Active dialog. If you start a 2nd conversation in the emulator the scenario works every time, only affects the first conversation unless you tear the Bot instance down. A race condition perhaps?

Here are the steps I go through to activate the Bot:

I create the ConversationState object and pass through to my Bot's constructor so it has access. I'm using CosmosDB.

  CosmosDbStorage cosmosDbStorage = new CosmosDbStorage(options);
  ConversationState conversationState = new ConversationState(cosmosDbStorage);
  CalendarSkill = new CalendarSkill(skillType, conversationState, "CalendarState");

The pipeline is configured with ConversationState only

  this.inProcAdapter = new InProcAdapter();
  this.inProcAdapter.Use(conversationState);

I then execute the pipeline and Bot as below

  // Process the activity (pass through middleware) and then perform Skill processing
 this.inProcAdapter.ProcessActivity(activity, async (skillContext, cancellationToken) =>
  {
       await activatedSkill.OnTurnAsync(skillContext);
  }).Wait();

Within the Bot here is the constructor:

public CalendarSkill(BotState botState, string stateName = null)
        {
            // Create the properties and populate the Accessors. 
            this.Accessors = new CalendarBotAccessors();            
            this.Accessors.CalendarSkillState = botState.CreateProperty<CalendarBotState>(stateName ?? nameof(CalendarBotState));
            this.Accessors.ConversationDialogState = botState.CreateProperty<DialogState>("DialogState");

            // Initialise dialogs
            this.Dialogs = new DialogSet(Accessors.ConversationDialogState);
        }

On closer monitoring of CosmosDB it would appear no record in the state store collection hence explaining why on dialog resumption the ActiveDialog isn't available. Therefore perhaps an issue with state serialization (or some handler not being called due to my manual invocation).

drub0y commented 6 years ago

~Hmm... where does the Activity actually come from? State is basically dependent on identifiers that come with the activity to store and later recall related state. For example, conversation state is a combination of ChannelId and Conversation.Id off of the activity. Is your custom adapter works keeping these consistent across related requests?~

Update: Sorry, I somehow glazed over the fact that you're using the emulator so the activities should be following standard protocol for identifiers. So, disregard my question and I'll have to think about this before I can really come up with an explanation. Any chance you can push a repo with your custom adapter into a public repo or gist that we can grab to test???

darrenj commented 6 years ago

I like your train of thought, I thought just the same and carefully checked all the events/messages weren't loosing or changing conversationId or channelId as there are some custom things created but they all checked out. I'll try and extract a repro tomorrow - it's a little involved but know that's key to getting to the bottom of it.

sgellock commented 6 years ago

@Stevenic has a quick fix for this to unblock you.

Stevenic commented 6 years ago

The first question I have is are you starting your conversation within a conversationUpdate event? If so, there's a known issue where the from address is missing/wrong so your state will get dropped. There's some middleware you can write to patch the incoming event. If that's not the case there are a couple of things you can look for:

  1. Make sure the from.id and conversation.id fields are the same for all incoming activities. This is the easiest way to drop state.
  2. Make sure you're not somehow calling dc.endAsync() after starting your prompt. Recent changes we've made are designed to mitigate exactly that issue but it could still happen. Feel free to share your dialogs code too for me to look over.
  3. Check for a missing await somewhere. It's easy to forget these and I've seen it cause weird behavior.
  4. Make sure you're not somehow throwing an error as that will prevent your state from being persisted.
darrenj commented 6 years ago

Thanks Steve. No, I'm not starting the "child" Bot conversation with a conversationUpdateevent. I have a custom event (skillBegin) which I send instead.

I've gone through and double checked the from.id and conversation.id fields to ensure there wasn't a mistake somewhere. See below for my message flow

skillBeginEvent (starts the conversation) image

Message – “Book a meeting” image

Event – “Token Response” image

Message (prompt response) image

Then you can see here how the conversation flows. All of the code up and including the Prompt (What's the title of the meeting) is "in memory" and doesn't trigger state serialization/deserialization hence the back and forward flow works just fine.

The response to the prompt ends up with no ActiveDialog due to this state problem hence it doesn't understand the utterance.

image

No calls to EndAsync in the flow and done a pass on awaits

What I don't understand is why the first conversation fails every time, but starting a new conversation works every time along with subsequent ones :-(

darrenj commented 6 years ago

I'm increasingly of the opinion this is a race condition related to the state not having been persisted in time for the follow-up response. This kind of proves the theory. Re-sending the question after an error (no active dialog) ends up with it finding the state/activedialog and then working.

image

I had ruled this out before by trying to manually call SetAsync on the Accessors and force a state write which didn't help the problem either then or now.

I get the prompt back quickly but due to the delayed state persistence (delayed further by conversation transcripts writing to blog storage) I suspect this is causing the delay to state being serialized meaning a message is sent back in without state being available. There is around 3-4 seconds gap between the activity being sent and the state being written which is backed up by not being able to repro the issue if I wait 5 seconds between responses.

Feels like we need to prioritise state over transcript serialization to avoid this. Having manual control over state persistence is one route but then you could end up with situations where a forgotten call leads to issues.

AppInsights remote dependencies show this in more detail

Application Insights Telemetry (unconfigured): {"name":"Microsoft.ApplicationInsights.Dev.RemoteDependency","time":"2018-09-03T12:43:48.7306400Z","tags":{"ai.application.ver":"1.0.0.0","ai.operation.id":"b5555fdf-4bb5c884636eed71","ai.operation.parentId":"|b5555fdf-4bb5c884636eed71.40.","ai.cloud.roleInstance":"MYMACHINE","ai.internal.nodeName":"MYMACHINE","ai.internal.sdkVersion":"rdddsc:2.4.1-1362","ai.operation.name":"POST ","ai.location.ip":"127.0.0.1"},"data":{"baseType":"RemoteDependencyData","baseData":{"ver":2,"name":"PUT skillprototypebot","id":"|b5555fdf-4bb5c884636eed71.40.1.","data":"https://skillprototypebot.blob.core.windows.net:443/skillprototypebot-transcripts/emulator/9idg3b801b6f/8d6119ae304fe86-gnj7ln4j489l.json?comp=blocklist","duration":"00:00:00.1853050","resultCode":"201","success":true,"type":"Azure blob","target":"skillprototypebot.blob.core.windows.net","properties":{"Container":"skillprototypebot-transcripts/emulator/9idg3b801b6f","AspNetCoreEnvironment":"Development","Blob":"8d6119ae304fe86-gnj7ln4j489l.json","DeveloperMode":"true"}}}}
Application Insights Telemetry (unconfigured): {"name":"Microsoft.ApplicationInsights.Dev.RemoteDependency","time":"2018-09-03T12:43:49.1069667Z","tags":{"ai.application.ver":"1.0.0.0","ai.operation.id":"b5555fdf-4bb5c884636eed71","ai.operation.parentId":"|b5555fdf-4bb5c884636eed71.41.","ai.cloud.roleInstance":"MYMACHINE","ai.internal.nodeName":"MYMACHINE","ai.internal.sdkVersion":"rdddsc:2.4.1-1362","ai.location.ip":"127.0.0.1","ai.operation.name":"POST "},"data":{"baseType":"RemoteDependencyData","baseData":{"ver":2,"name":"PUT skillprototypebot","id":"|b5555fdf-4bb5c884636eed71.41.1.","data":"https://skillprototypebot.blob.core.windows.net:443/skillprototypebot-transcripts/emulator/9idg3b801b6f/8d6119ae304fe86-gnj7ln4j489l.json?comp=metadata","duration":"00:00:00","resultCode":"200","success":true,"type":"Azure blob","target":"skillprototypebot.blob.core.windows.net","properties":{"Container":"skillprototypebot-transcripts/emulator/9idg3b801b6f","AspNetCoreEnvironment":"Development","Blob":"8d6119ae304fe86-gnj7ln4j489l.json","DeveloperMode":"true"}}}}
Application Insights Telemetry (unconfigured): {"name":"Microsoft.ApplicationInsights.Dev.RemoteDependency","time":"2018-09-03T12:43:48.9365967Z","tags":{"ai.application.ver":"1.0.0.0","ai.operation.id":"b5555fdf-4bb5c884636eed71","ai.operation.parentId":"|b5555fdf-4bb5c884636eed71.41.","ai.cloud.roleInstance":"MYMACHINE","ai.internal.nodeName":"MYMACHINE","ai.internal.sdkVersion":"rdddsc:2.4.1-1362","ai.operation.name":"POST ","ai.location.ip":"127.0.0.1"},"data":{"baseType":"RemoteDependencyData","baseData":{"ver":2,"name":"PUT skillprototypebot","id":"|b5555fdf-4bb5c884636eed71.41.1.","data":"https://skillprototypebot.blob.core.windows.net:443/skillprototypebot-transcripts/emulator/9idg3b801b6f/8d6119ae304fe86-gnj7ln4j489l.json?comp=metadata","duration":"00:00:00.1840458","resultCode":"200","success":true,"type":"Azure blob","target":"skillprototypebot.blob.core.windows.net","properties":{"Container":"skillprototypebot-transcripts/emulator/9idg3b801b6f","AspNetCoreEnvironment":"Development","Blob":"8d6119ae304fe86-gnj7ln4j489l.json","DeveloperMode":"true"}}}}
Application Insights Telemetry (unconfigured): {"name":"Microsoft.ApplicationInsights.Dev.RemoteDependency","time":"2018-09-03T12:43:49.3099816Z","tags":{"ai.application.ver":"1.0.0.0","ai.operation.id":"b5555fdf-4bb5c884636eed71","ai.operation.parentId":"|b5555fdf-4bb5c884636eed71.42.","ai.cloud.roleInstance":"MYMACHINE","ai.internal.nodeName":"MYMACHINE","ai.internal.sdkVersion":"rdddsc:2.4.1-1362","ai.location.ip":"127.0.0.1","ai.operation.name":"POST "},"data":{"baseType":"RemoteDependencyData","baseData":{"ver":2,"name":"PUT skillprototypebot","id":"|b5555fdf-4bb5c884636eed71.42.1.","data":"https://skillprototypebot.blob.core.windows.net:443/skillprototypebot-transcripts/emulator/9idg3b801b6f/8d6119ae3420442-knnkj7jgnnd.json?comp=block&blockid=MTM5OGI4NWQyNzQ0NDgzODg2NjE5YWQxMDlhNmYyZTUtMDAwMDAw","duration":"00:00:00","resultCode":"201","success":true,"type":"Azure blob","target":"skillprototypebot.blob.core.windows.net","properties":{"Container":"skillprototypebot-transcripts/emulator/9idg3b801b6f","AspNetCoreEnvironment":"Development","Blob":"8d6119ae3420442-knnkj7jgnnd.json","DeveloperMode":"true"}}}}
Application Insights Telemetry (unconfigured): {"name":"Microsoft.ApplicationInsights.Dev.RemoteDependency","time":"2018-09-03T12:43:49.1420967Z","tags":{"ai.application.ver":"1.0.0.0","ai.operation.id":"b5555fdf-4bb5c884636eed71","ai.operation.parentId":"|b5555fdf-4bb5c884636eed71.42.","ai.cloud.roleInstance":"MYMACHINE","ai.internal.nodeName":"MYMACHINE","ai.internal.sdkVersion":"rdddsc:2.4.1-1362","ai.operation.name":"POST ","ai.location.ip":"127.0.0.1"},"data":{"baseType":"RemoteDependencyData","baseData":{"ver":2,"name":"PUT skillprototypebot","id":"|b5555fdf-4bb5c884636eed71.42.1.","data":"https://skillprototypebot.blob.core.windows.net:443/skillprototypebot-transcripts/emulator/9idg3b801b6f/8d6119ae3420442-knnkj7jgnnd.json?comp=block&blockid=MTM5OGI4NWQyNzQ0NDgzODg2NjE5YWQxMDlhNmYyZTUtMDAwMDAw","duration":"00:00:00.1820068","resultCode":"201","success":true,"type":"Azure blob","target":"skillprototypebot.blob.core.windows.net","properties":{"Container":"skillprototypebot-transcripts/emulator/9idg3b801b6f","AspNetCoreEnvironment":"Development","Blob":"8d6119ae3420442-knnkj7jgnnd.json","DeveloperMode":"true"}}}}
Application Insights Telemetry (unconfigured): {"name":"Microsoft.ApplicationInsights.Dev.RemoteDependency","time":"2018-09-03T12:43:49.5065055Z","tags":{"ai.application.ver":"1.0.0.0","ai.operation.id":"b5555fdf-4bb5c884636eed71","ai.operation.parentId":"|b5555fdf-4bb5c884636eed71.43.","ai.cloud.roleInstance":"MYMACHINE","ai.internal.nodeName":"MYMACHINE","ai.internal.sdkVersion":"rdddsc:2.4.1-1362","ai.location.ip":"127.0.0.1","ai.operation.name":"POST "},"data":{"baseType":"RemoteDependencyData","baseData":{"ver":2,"name":"PUT skillprototypebot","id":"|b5555fdf-4bb5c884636eed71.43.1.","data":"https://skillprototypebot.blob.core.windows.net:443/skillprototypebot-transcripts/emulator/9idg3b801b6f/8d6119ae3420442-knnkj7jgnnd.json?comp=blocklist","duration":"00:00:00","resultCode":"201","success":true,"type":"Azure blob","target":"skillprototypebot.blob.core.windows.net","properties":{"Container":"skillprototypebot-transcripts/emulator/9idg3b801b6f","AspNetCoreEnvironment":"Development","Blob":"8d6119ae3420442-knnkj7jgnnd.json","DeveloperMode":"true"}}}}
Application Insights Telemetry (unconfigured): {"name":"Microsoft.ApplicationInsights.Dev.RemoteDependency","time":"2018-09-03T12:43:49.3387905Z","tags":{"ai.application.ver":"1.0.0.0","ai.operation.id":"b5555fdf-4bb5c884636eed71","ai.operation.parentId":"|b5555fdf-4bb5c884636eed71.43.","ai.cloud.roleInstance":"MYMACHINE","ai.internal.nodeName":"MYMACHINE","ai.internal.sdkVersion":"rdddsc:2.4.1-1362","ai.operation.name":"POST ","ai.location.ip":"127.0.0.1"},"data":{"baseType":"RemoteDependencyData","baseData":{"ver":2,"name":"PUT skillprototypebot","id":"|b5555fdf-4bb5c884636eed71.43.1.","data":"https://skillprototypebot.blob.core.windows.net:443/skillprototypebot-transcripts/emulator/9idg3b801b6f/8d6119ae3420442-knnkj7jgnnd.json?comp=blocklist","duration":"00:00:00.1825437","resultCode":"201","success":true,"type":"Azure blob","target":"skillprototypebot.blob.core.windows.net","properties":{"Container":"skillprototypebot-transcripts/emulator/9idg3b801b6f","AspNetCoreEnvironment":"Development","Blob":"8d6119ae3420442-knnkj7jgnnd.json","DeveloperMode":"true"}}}}
Application Insights Telemetry (unconfigured): {"name":"Microsoft.ApplicationInsights.Dev.RemoteDependency","time":"2018-09-03T12:43:49.7105194Z","tags":{"ai.application.ver":"1.0.0.0","ai.operation.id":"b5555fdf-4bb5c884636eed71","ai.operation.parentId":"|b5555fdf-4bb5c884636eed71.44.","ai.cloud.roleInstance":"MYMACHINE","ai.internal.nodeName":"MYMACHINE","ai.internal.sdkVersion":"rdddsc:2.4.1-1362","ai.location.ip":"127.0.0.1","ai.operation.name":"POST "},"data":{"baseType":"RemoteDependencyData","baseData":{"ver":2,"name":"PUT skillprototypebot","id":"|b5555fdf-4bb5c884636eed71.44.1.","data":"https://skillprototypebot.blob.core.windows.net:443/skillprototypebot-transcripts/emulator/9idg3b801b6f/8d6119ae3420442-knnkj7jgnnd.json?comp=metadata","duration":"00:00:00","resultCode":"200","success":true,"type":"Azure blob","target":"skillprototypebot.blob.core.windows.net","properties":{"Container":"skillprototypebot-transcripts/emulator/9idg3b801b6f","AspNetCoreEnvironment":"Development","Blob":"8d6119ae3420442-knnkj7jgnnd.json","DeveloperMode":"true"}}}}
Application Insights Telemetry (unconfigured): {"name":"Microsoft.ApplicationInsights.Dev.RemoteDependency","time":"2018-09-03T12:43:49.5423471Z","tags":{"ai.application.ver":"1.0.0.0","ai.operation.id":"b5555fdf-4bb5c884636eed71","ai.operation.parentId":"|b5555fdf-4bb5c884636eed71.44.","ai.cloud.roleInstance":"MYMACHINE","ai.internal.nodeName":"MYMACHINE","ai.internal.sdkVersion":"rdddsc:2.4.1-1362","ai.operation.name":"POST ","ai.location.ip":"127.0.0.1"},"data":{"baseType":"RemoteDependencyData","baseData":{"ver":2,"name":"PUT skillprototypebot","id":"|b5555fdf-4bb5c884636eed71.44.1.","data":"https://skillprototypebot.blob.core.windows.net:443/skillprototypebot-transcripts/emulator/9idg3b801b6f/8d6119ae3420442-knnkj7jgnnd.json?comp=metadata","duration":"00:00:00.1806474","resultCode":"200","success":true,"type":"Azure blob","target":"skillprototypebot.blob.core.windows.net","properties":{"Container":"skillprototypebot-transcripts/emulator/9idg3b801b6f","AspNetCoreEnvironment":"Development","Blob":"8d6119ae3420442-knnkj7jgnnd.json","DeveloperMode":"true"}}}}
Application Insights Telemetry (unconfigured): {"name":"Microsoft.ApplicationInsights.Dev.RemoteDependency","time":"2018-09-03T12:43:49.9287830Z","tags":{"ai.application.ver":"1.0.0.0","ai.operation.id":"b5555fdf-4bb5c884636eed71","ai.operation.parentId":"|b5555fdf-4bb5c884636eed71.45.","ai.cloud.roleInstance":"MYMACHINE","ai.internal.nodeName":"MYMACHINE","ai.internal.sdkVersion":"rdddsc:2.4.1-1362","ai.location.ip":"127.0.0.1","ai.operation.name":"POST "},"data":{"baseType":"RemoteDependencyData","baseData":{"ver":2,"name":"Create document","id":"|b5555fdf-4bb5c884636eed71.45.1.","data":"https://skillprototypebot-westus.documents.azure.com/dbs/XYEYAA==/colls/XYEYAITLcW0=/docs","duration":"00:00:00","resultCode":"201","success":true,"type":"Azure DocumentDB","target":"skillprototypebot-westus.documents.azure.com","properties":{"AspNetCoreEnvironment":"Development","Collection":"XYEYAITLcW0=","DeveloperMode":"true","Database":"XYEYAA=="}}}}
Application Insights Telemetry (unconfigured): {"name":"Microsoft.ApplicationInsights.Dev.RemoteDependency","time":"2018-09-03T12:43:49.7531813Z","tags":{"ai.application.ver":"1.0.0.0","ai.operation.id":"b5555fdf-4bb5c884636eed71","ai.operation.parentId":"|b5555fdf-4bb5c884636eed71.45.","ai.cloud.roleInstance":"MYMACHINE","ai.internal.nodeName":"MYMACHINE","ai.internal.sdkVersion":"rdddsc:2.4.1-1362","ai.operation.name":"POST ","ai.location.ip":"127.0.0.1"},"data":{"baseType":"RemoteDependencyData","baseData":{"ver":2,"name":"Create document","id":"|b5555fdf-4bb5c884636eed71.45.1.","data":"https://skillprototypebot-westus.documents.azure.com/dbs/XYEYAA==/colls/XYEYAITLcW0=/docs","duration":"00:00:00.1905382","resultCode":"201","success":true,"type":"Azure DocumentDB","target":"skillprototypebot-westus.documents.azure.com","properties":{"AspNetCoreEnvironment":"Development","Collection":"XYEYAITLcW0=","DeveloperMode":"true","Database":"XYEYAA=="}}}}
johnataylor commented 6 years ago

The behavior here is expected.

The code in the 42.ScaleOut is a sample of how to build a state solution that doesn't exhibit the behavior described here. The accompanying documentation linked to from the readme explains the concepts.

@darrenj will open a DCR in BotBuilder repo to be triaged for the next release cycle concerning adding a version of the code from this sample to the NuGet Packages.

@yochay has asked for CosmosDB to be added to the sample with some priority.

We also plan to add an equivalent sample but in JavaScript.