microsoft / botframework-sdk

Bot Framework provides the most comprehensive experience for building conversation applications.
MIT License
7.46k stars 2.44k forks source link

There was an error sending this message to your bot: HTTP status code GatewayTimeout #4559

Closed ItsMeArthur closed 5 years ago

ItsMeArthur commented 6 years ago

Bot Info

I have a chatbot in production, and yesterday I've had some issues regarding exceptions being thrown within the chatbot, resulting in the "Sorry my bot code is having an issue" message.

Weirdly the chatbot provides the correct response to the user, however after the expected activity the user get's the "Sorry my bot code is having an issue".

I've got the error bot on Skype for Business and DirectLine (via WebChat): image

When I open the Skype for Business log I see the following error messages: image

When I go to the Application Insights of this bot I see the following exceptions were raised: image

As you can see the Microsoft.Bot.Schema.BotTimeoutException was raised a bunch of times.

If I inspect this exception within the Application Insights I can see this information: image

Here's the call stack:

Microsoft.Bot.Schema.BotTimeoutException: at Microsoft.Bot.ChannelConector.BotAPI+d__30.MoveNext (Microsoft.Bot.ChannelConnector, Version=3.2.1.0, Culture=neutral, PublicKeyToken=null)

From what I was able to gather this happens when the bot takes more than 15 seconds to provide a response to the bot connector. However sometimes the bot gives us the expected reply within something like 3 seconds, and we still get this exception.

This is the chatbot Microsoft App Id: 2eb9dadb-2d6f-4cea-a6e0-8cce748722ab

I'm not sure if I can post the bot handler publicly, as it contains the name of our client. But If needed I can provide it to someone via email.

JasonSowers commented 6 years ago

You can send me your code at the email listed in my profile, this is not going to be enough information to diagnose anything.

JasonSowers commented 6 years ago

@ItsMeArthur If you sent me anything I did not get it, can you resend?

ItsMeArthur commented 6 years ago

@JasonSowers Just mailed you the source code. Sorry for taking so long, but I had to get the proper permission from my managers. Thank you very much for looking into this.

JasonSowers commented 6 years ago

@ItsMeArthur I did not receive anything, I will make a private repo and invite you to it so you can drop the code in there instead. I'll give you full access to delete it at any time.

ItsMeArthur commented 6 years ago

Hi @JasonSowers. Weird, just added the files on the repo you've created.

JasonSowers commented 6 years ago

Thanks, i will take a look as soon as i get a chance

JasonSowers commented 6 years ago

@ItsMeArthur I was able to use your bot in webchat and I was not able to see the behavior you are describing. This exception occurs when the bot takes longer than 15 seconds to respond. I see you are making some API calls and have other processes that are adding to the time to respond. It doesn't seem like anything you are doing in your code should make the response time > 15 seconds so I wonder... Is there any chance when you noticed this happening there was just slowness on your network, or maybe even reaching Azure? Is this consistently happening or did it just happen in a small time period?

ItsMeArthur commented 6 years ago

Hi @JasonSowers.

Is this consistently happening or did it just happen in a small time period?

This exception it's kind of random, it happens in a kind of intermittent manner. When compared to the total amount of messages exchanged between users and bot the number is low, but high enough to bother some users and my clients want it gone.

I see you are making some API calls and have other processes that are adding to the time to respond.

In the code I've shared with you we get the reply to the user from an external service. I'm using Application Insights to monitor response times and the response time rarely exceeds one second, and when it does it's by little.

Is there any chance when you noticed this happening there was just slowness on your network, or maybe even reaching Azure?

This happens sometimes when I'm using our corporate network, which is really good, and also when using my 4G mobile connection, which is also not bad. I've tried to use a really robust Web App with a lot of RAM and also using CosmosDB to store state data. The problem still happens sometimes (again, enough to bother some users and clients).

This issue was brought up yesterday by other people in a WhatsApp chatbot developers group I'm part of, and many people were facing the same issue. When this happens the bot gives an "Sorry, my bot code is having an issue" to the users. Some people have stopped trying to figure out the problem and are now just trying to suppress the exception message, which I'm not sure it's something we should be doing, but we have bots in production and wee need to find a way around this issue.

I've tried to send a lot of messages to a simple bot (not even using LUIS) made from the standard template from the Bot Framework V3 and the issue has happened. So we're really at loss here.

andreluizsecco commented 6 years ago

I have the same exactly problem.

JasonSowers commented 6 years ago

Which data centers are your bots deployed to? I'm wondering if you are both in the same data center(s). This is strange that many people are claiming to see this. Most of my days are spent tinkering with and testing bots and I am not seeing it at all.

How often are you seeing this? When you do see it does it seem to happen in clusters or is it truly random?

nrishoj commented 6 years ago

I do experience the exact same problem.

The published bot is located at Azure/West Europe. It does seem to be random, but in clusters. Today has been critical.

I rarely experience it with my local emulator.

Time Message
6/14/2018, 11:49:15 AM There was an error sending this message to your bot: HTTP status code GatewayTimeout
6/14/2018, 11:47:35 AM There was an error sending this message to your bot: HTTP status code GatewayTimeout
6/14/2018, 11:47:28 AM There was an error sending this message to your bot: HTTP status code GatewayTimeout
6/14/2018, 11:46:43 AM There was an error sending this message to your bot: HTTP status code GatewayTimeout
6/14/2018, 11:31:09 AM There was an error sending this message to your bot: HTTP status code GatewayTimeout
... ...

If you guys found a way to get around or somehow mitigate the issue, please let me know.

ItsMeArthur commented 6 years ago

@JasonSowers Sorry for the delayed answer. I always deploy my chatbots either to Central US or South Central US, and I had the problem occur in bot regions.

asfyra commented 6 years ago

I am also having the same error occuring randomly. It's a Q&A bot that makes async http posts on an external site to get the answers. The bot is hosted on azure and i use CosmosDB for the bot state. There are days that the error is more frequent than others.

Thank you

nrishoj commented 6 years ago

I upgraded to the most recent version of the NuGet packages (not preview). This seems to mitigate the issue, hence I haven't been able to replicate the error yet.

asfyra commented 6 years ago

@nrishoj I use .NET version of the bot builder and i am on the latest version. Maybe the next .net version deals with this problem.

nrishoj commented 6 years ago

I use .NET version of the bot builder and i am on the latest version. Maybe the next .net version deals with this problem.

OK - Same for me.

JasonSowers commented 6 years ago

Thank you all for reporting the behavior and keeping us updated, please continue to do so. We will continue to monitor the applications we are testing and using to see if I can find a reproduction and root cause.

JasonSowers commented 6 years ago

There was a new bot.builder package released today 3.15.2.3 I'm not exactly sure what changes were including as I just came back from being out of the country. I would like to suggest you try it out and see if this still persists. I know there was a 401 issue addressed and there is a chance you could be seeing the same issue, it is just masked under some error that was causing the timeouts.

JasonSowers commented 6 years ago

Has anyone been able to test with the new packages and see if that helped?

asfyra commented 6 years ago

SDK Platform: .NET SDK Version: 3.15.3 Active Channels: Direct Line, WebChat Deployment Environment: Azure App Service and Web App Bot State: Cosmos DB

I am on the latest version of the packages and today i had three issues. One almost every one hour.

timestamp [UTC]: 2018-07-03T11:13:13.397Z ConversationID: 9oTcIXJjCpZI57rGB6rw0f

Microsoft.Bot.Schema.BotTimeoutException at Microsoft.Bot.ChannelConnector.BotAPI+<PostActivityToBotAsync>d__30.MoveNext

[{
    "parsedStack": [{
        "assembly": "Microsoft.Bot.ChannelConnector, Version=3.2.2.1, Culture=neutral, PublicKeyToken=null",
        "method": "Microsoft.Bot.ChannelConnector.BotAPI+<PostActivityToBotAsync>d__30.MoveNext",
        "level": 0,
        "line": 0
    }],
    "outerId": "0",
    "message": "POST to piraeusbot timed out after 15s",
    "type": "Microsoft.Bot.Schema.BotTimeoutException",
    "id": "5076890"
}]
jjgriff93 commented 6 years ago

I'm also having the same problem - on the latest packages, with .NET. The channel is Skype for Business and when the timeout occurs the users are getting 'Error happened in Skype for Business when reaching the bot service.'

image

neetgupta7 commented 6 years ago

@JasonSowers Let me also jump into fray and say i am also facing same issue with all my bots where my service is located in South Central US location. And this has started happening since 2 hours back. At least 12-15 hours back it was working fine. error

neetgupta7 commented 6 years ago

Its not related to region. Some bots are working fine in the same and others are not. And ones which are working they sometime stop working. Quite strange. Until yesterday all was good.

ItsMeArthur commented 6 years ago

@JasonSowers From yesterday to today many of my clients bots are facing this issue continuously. The users are unable to interact with most of them. This is causing us many problems, and we are unable to do anything about it. This is happening with bots in different regions. Can you guys check it out and see if there's something that can be done?

image

JasonSowers commented 6 years ago

@ItsMeArthur are you using the default state store or are you using an Azure table/cosmos or something? If you are not using the default state store please post that in the other thread.

ItsMeArthur commented 6 years ago

@JasonSowers I'm using custom storage with Azure Table Storage, and I've clarified that there. The problem seems the be solved for now, but my experience with this issue is that this keeps happening randomly. Sometimes I exchange 50, 100 or 200 messages with one of my bots and everything goes as expected. Then I get one or two GatewayTimeout messages and a "Sorry my bot code is having an issue" exception and after that the bot goes back to normal again. Really weird and specially awkward during project presentations.

sayertherebel commented 6 years ago

I'm seeing these errors again today on two of our custom Bots.. anyone else? Thanks

wladneto commented 6 years ago

@sayertherebel I receveid yesterday in one bot in Telegram... :(

JasonSowers commented 6 years ago

When you see these errors please provide:

time (UTC), channel, and part of a conversation id (just the first or last couple characters should be fine).

So @vincec-msft can take a look?

sayertherebel commented 6 years ago

@JasonSowers thanks, both bots have been working reliably today. The issues I was facing yesterday were mostly from Teams, where intermittently the messages were seemingly not even hitting my bot services. From web chat, I did get a string of gateway errors noted on the channels view in Azure, for the most part there was no trace of the missing messages. I did see at one point, I sent 'help' to the bot, and I got 5 responses a few seconds apart :-)

wladneto commented 6 years ago

error The app id is 83108ba9-f5f3-43da-95a4-40b56a96f6be for this example... @vincec-msft can take a look? It's very intermittently... :(

JasonSowers commented 6 years ago

@wladneto are you able to provide time in UTC and a conversation ID please?

wladneto commented 6 years ago

How can a obtain a ConversationId @JasonSowers ? I don,t actually control this :( My WEBSITE_TIME_ZONE in Azure Application is "E. South America Standard Time"

vincec-msft commented 6 years ago

Don't post telegram conversation ids. I'm looking at this now.

vincec-msft commented 6 years ago

@wladneto , I looked into several of the errors you saw on 22.Jul. In all cases the problem was that the bot was not responding to an activity within 15 seconds. That is the maximum timeout that the channel will wait for a response.

I recommend that you look at your bot's logs to see where the time is being spent.

If you're seeing slow responses after periods of inactivity you might want to read this:

https://docs.microsoft.com/en-us/azure/bot-service/bot-service-troubleshoot-general-problems?view=azure-bot-service-3.0#my-bot-is-slow-to-respond-to-the-first-message-it-receives-how-can-i-make-it-faster

There was one post to Telegram at 2018-07-22T17:40:45Z that took 10 seconds, but the overall request was still under the 15 seconds so there was no timeout. 95% of posts to Telegram take less than 2 seconds but some will take longer.

To be safe, you should have your bot reply as quickly as possible to the initial activity post. Any requests longer than 10 seconds are getting dangerously close to the 15 second timeout.

Also, I noticed that your bot is still using the default State Service. This service has been deprecated so please switch to a custom State Service as soon as possible.

https://docs.microsoft.com/en-us/azure/bot-service/dotnet/bot-builder-dotnet-state?view=azure-bot-service-3.0

To convert between timezones use https://www.worldtimebuddy.com/. Please always post UTC times.

hrumhurum commented 6 years ago

Is there a way to lift 15 seconds timeout? It is unrealistically low.

For example, when the ASP.NET bot app is in cold state, it may easily take up to 30 seconds for it to boot. Another example, the first cold access to QnA maker endpoint is 18 seconds.

We lose customers with the current 15 seconds limit. They write 'hi' and do not get anything in return even 60 seconds later because bot's answer is discarded when it does not fit into 15 seconds limit.

UPDATE: Updating Microsoft.Bot NuGet packages from version 3.15.2.2 to 3.15.3 seems to solve the issue. The bot is adequately responsive now, even when all services are cold.

vincec-msft commented 6 years ago

@hrumhurum , the 15 second timeout cannot be changed. But your bot doesn't have to reply with the answer in 15 seconds, it just has to acknowledge that it received the activity. It can reply with another activity at any time. See #3122 for sample code.

Also see this for the startup issue:

https://docs.microsoft.com/en-us/azure/bot-service/bot-service-troubleshoot-general-problems?view=azure-bot-service-3.0#my-bot-is-slow-to-respond-to-the-first-message-it-receives-how-can-i-make-it-faster

hrumhurum commented 6 years ago

@vincec-msft, thank you for explanation, didn't know it was the case. I checked once again and you are right, 15 seconds is not a hard limit.

My problem was caused by some kind of miscommunication between Bot App and Bot Connector. Updating Microsoft.Bot NuGet packages from version 3.15.2.2 to 3.15.3 solved the issue.

Thank you. Keep up the good work. Bot Framework and QnA Maker are amazing pieces. I love them.

Prybe commented 6 years ago

I was no able to upgrade it to 3.15.3. currently my project.json contains net46 and "Microsoft.Bot.Builder.Azure": "3.15.2.2". When I update that to 3.15.3 and deploy it the bot is dead. Any ideas?

hrumhurum commented 6 years ago

@Prybe, please take a look at Web.config and if it has binding redirects for Microsoft.Bot.* assemblies then ensure they point to 3.15.3.0 assemblies and not to 3.15.2.2.

JasonSowers commented 5 years ago

Thank you for opening an issue against the Bot Framework SDK v3. As part of the Bot Framework v4 release, we’ve moved all v3 work to a new repo located at https://github.com/microsoft/botbuilder-v3. We will continue to support and offer maintenance updates to v3 via this new repo.

From now on, https://github.com/microsoft/botbuilder repo will be used as hub, with pointers to all the different SDK languages, tools and samples repos.

As part of this restructuring, we are closing all tickets in this repo.

For defects or feature requests, please create a new issue in the new Bot Framework v3 repo found here: https://github.com/microsoft/botbuilder-v3/issues

For Azure Bot Service Channel specific defects or feature requests (e.g. Facebook, Twilio, Teams, Slack, etc.), please create a new issue in the new Bot Framework Channel repo found here: https://github.com/microsoft/botframework-services/issues

For product behavior, how-to, or general understanding questions, please use Stackoverflow. https://stackoverflow.com/search?q=bot+framework

Thank you.

The Bot Framework Team

i524304 commented 4 years ago

Hi @JasonSowers,

I am also experiencing the same issue with direct line channel.

There was an error sending this message to your bot: HTTP status code GatewayTimeout | |3ec0e04aa45cff4ab634389490730959.91e2440c_

Could you please help me?

Kind regards Vaishnavi. -- | --

mohsenualam commented 3 years ago

The bot seemed working for few days. It started this issue last week. Same error:

There was an error sending this message to your bot: HTTP status code GatewayTimeout.