Open nodkrot opened 1 week ago
I'm getting the same error right now.
I'm getting the same error right now.
Getting the same error intermittently too!
we are receiving same error as well
We are seeing the same on our two apps as well.
The OP indicates 4.16, which is quite old. What about the others in this thread?
I'm on 4.22.2
Same issue for us.
First occurrence: 2024-07-03 03:37:56 pm PDT.
Since then 290 further occurrences.
For the time being we're catching this error and simply retrying each request up to 3x and virtually all succeed. However we'd prefer to not have to do this.
we are seeing issue on 4.22.3 as well ..
Since this is happening on older versions, it's probably not related to a recent change in SDK. I'll check for issues with the other end. I'm assuming its failing during the request, especially since retries make it work.
I'm wondering what the HTTP response status code is. Is it being throttled? Though that is a specific status code and handled automatically. We can help mitigate by increasing our retries.
Since this is happening on older versions, it's probably not related to a recent change in SDK. I'll check for issues with the other end. I'm assuming its failing during the request, especially since retries make it work.
I'm wondering what the HTTP response status code is. Is it being throttled? Though that is a specific status code and handled automatically. We can help mitigate by increasing our retries.
We receive 500 if I'm not mistaken. Since it's intermittent, it feels like there is a load balancer pointing to a server with an expired certificate. My guess
The trace provided in this issue (third comment for requests.exceptions.SSLError
) seems to indicate it's occurring with Python as well:
In which case it may have nothing to do with bot-framework, which seems likely given these type of certificate issues generally originate on the server.
I have a support ticket open with Microsoft, however I've been informed that I don't have the correct tier of paid support to have anyone from engineering take a look.
If anyone has premium support I'd appreciate them raising the issue through a ticket.
I'm wondering what the HTTP response status code is. Is it being throttled? Though that is a specific status code and handled automatically. We can help mitigate by increasing our retries.
I think it's actually a success response code. Our 429/5XX logic wasn't catching this error. We had to add some custom logic to catch-and-retry it via matching on e.message.includes("unable to verify the first certificate")
. As far as I can tell, the server believes the request is valid.
Hi,
We started facing this error with our Teams App on 3/Jul/2024 and it has already impacted an important release. Before this, our App has been running fine since Dec'23. We are using botbuilder@4.20.0 on node.js@18.20.3.
The exception has the format: FetchError: request to \<URL> failed, reason: unable to verify the first certificate
The target
We face this intermittently when our Teams app sends or updates an adaptive card using the sendActivity or updateActivity methods of the TurnContext object in the Bot Builder SDK for Node.js.
We were able to replicate this issue in non-production and usually see it occur for 10% of the send/update attempts. Handling the exception and retrying the activity works on the first attempt. I think you should be able to fix this by identifying and fixing the failing certificate chain on the concerned nodes of your distributed infrastructure.
Looking forward to a quick fix! Thanks.
Are these all Teams bots? 4.21.0 is when JS SDK moved to MSAL auth (from ADAL). But 4.16 would still be ADAL of course.
Ours is a Teams Bot.
This is being actively investigated by the Teams group.
Also experiencing this with a teams bot running 4.21.1
There has been at least two Sev 2's raised in that group. That is Microsoft terminology for high impact issue. It also means there will be eyes on it. I'll leave this open and post updates as I get them.
Same issue with two of our application.
Hey team, do you still see the issues? Could you please let us know if you still see the issues?
Hey team, do you still see the issues? Could you please let us know if you still see the issues?
The last instance we observed was at 2024-07-08 04:00:17 pm PDT
(~3 hours ago).
However, since it's intermittent and our app traffic volume is lower in the evening, we'll need to wait longer to be sure
Thank you for sharing @at1as! For others, please do share whether you still see the issues or not with me here :)
One more question to the group: could you share your end point you are targeting like @gitnavneet did? I am mainly interested in the region that the end point contains. ex: "https://smba.trafficmanager.net/**amer**/v3/conversations/.../activities/..."
Hi,
We started facing this error with our Teams App on 3/Jul/2024 and it has already impacted an important release. Before this, our App has been running fine since Dec'23. We are using botbuilder@4.20.0 on node.js@18.20.3.
The exception has the format: FetchError: request to
failed, reason: unable to verify the first certificate
- type: system
- errno, code: UNABLE_TO_VERIFY_LEAF_SIGNATURE
The target is of the form: "https://smba.trafficmanager.net/amer/v3/conversations/.../activities/..."
We face this intermittently when our Teams app sends or updates an adaptive card using the sendActivity or updateActivity methods of the TurnContext object in the Bot Builder SDK for Node.js.
We were able to replicate this issue in non-production and usually see it occur for 10% of the send/update attempts. Handling the exception and retrying the activity works on the first attempt. I think you should be able to fix this by identifying and fixing the failing certificate chain on the concerned nodes of your distributed infrastructure.
Looking forward to a quick fix! Thanks.
Haven't seen the issue in the last few hours, but, will wait and see how things go tomorrow!
Our region endpoint was amer
Hey team, do you still see the issues? Could you please let us know if you still see the issues?
Hey @YunnyChung,
We tried replicating the issue again in non-production today, however, it has not recurred! 😄
In summary,
We will continue monitoring the production logs for a few more days and report back here if the issue recurs.
🤔 Could you please share the root cause analysis? Thanks!
Errors stopped for as well
Thank you so much everyone for sharing the information here! Yes, please do let me know whether errors have stopped or still occurring. We will share the root cause analysis when it is ready.
One another small favor I want to ask to this community: For those who encountered this issue, are you using Linux? Could you share the OS of the machine you used when this issue occurred?
Thank you so much everyone for sharing the information here! Yes, please do let me know whether errors have stopped or still occurring. We will share the root cause analysis when it is ready.
One another small favor I want to ask to this community: For those who encountered this issue, are you using Linux? Could you share the OS of the machine you used when this issue occurred?
Our solution is deployed as an Azure Function app running on Windows.
Thank you so much everyone for sharing the information here! Yes, please do let me know whether errors have stopped or still occurring. We will share the root cause analysis when it is ready.
One another small favor I want to ask to this community: For those who encountered this issue, are you using Linux? Could you share the OS of the machine you used when this issue occurred?
I didn't see any errors yesterday, thanks for that! However, today we received a new one, probably related:
{
"url": "https://smba.trafficmanager.net/amer/api/v2/[redacted]/channel/messages/[redacted]"
"response": "read ECONNRESET"
}
We saw this error once too, today at 6:15:57 UTC.
{
"level": "error",
"message": "An error occurred while handling the event FetchError: request to https://smba.trafficmanager.net/amer/v3/conversations/.../activities/... failed, reason: read ECONNRESET"
}
Thank you so much everyone for sharing the information here! Yes, please do let me know whether errors have stopped or still occurring. We will share the root cause analysis when it is ready.
One another small favor I want to ask to this community: For those who encountered this issue, are you using Linux? Could you share the OS of the machine you used when this issue occurred?
Our bot is deployed as an service in Google Cloud Functions
@FBraz-RMFarma
Our bot is deployed as an service in Google Cloud Functions
No kidding? That's pretty cool.
I saw the ECONNRESET
as well yesterday.
We saw this error once too, today at 6:15:57 UTC.
{ "level": "error", "message": "An error occurred while handling the event FetchError: request to https://smba.trafficmanager.net/amer/v3/conversations/.../activities/... failed, reason: read ECONNRESET" }
Yeah, we got 223 issues in the last hour 😞
Yeah, we got 223 issues in the last hour 😞
@avilabiel , now you have me worried! 😮 How did those 223 new issues impact your app or end users?
For our app, it caused an ephemeral error message, but the action succeeded with a retry.
Thank you everyone for sharing the issue here 🙏🏻 For those who started to see the error, could you help me one more time by sharing the following information?
Latest: 2024-07-10 08:00:42 am PDT (50 mins ago) First: 2024-07-09 08:00:03 pm PDT Occurrences: 37
Our app is heavily skewed to triggering on the hour, for certain hours, so ~50 minutes of no occurrences does not tell us much. It is intermittent, but I don't have immediate counts.
We see it for this endpoint (though it's our most heavily used, so this isn't necessarily an exhaustive list):
https://smba.trafficmanager.net/amer/v3/conversations", "method":"POST"
We are in active process for investigation & mitigation -quick two follow up questions here:
(1) For those failed requests, do you see those requests' responses containing header with the ms-cv? (2) Could you let me now if you see the decrease of the errors starting 07/10/24 8:30 pm in UTC?
@YunnyChung ,
The read ECONNRESET
error occurred multiple times for our app today (10/Jul/24). Some users missed important updates. So, we have postponed the app launch to a wider group.
(1) I did not notice any header with ms-cv
in our logs.
(2) Yes, errors have decreased. They have not occurred since 19:05:49 UTC.
Some error timestamps (UTC): 6:15:57 (first), 14:16:51, 14:19:28, 15:21:01, 16:50:57, 19:05:49 (last).
Target: https://smba.trafficmanager.net/amer/v3/conversations/.../activites/...
Versions
What package version of the SDK are you using. (botbuilder@4.16.0) What nodejs version are you using (v16.20.2) What os are you using (Mac)
Describe the bug
On July 3rd we started receiving "unable to verify the first certificate" error when starting and using botbuilder sdk APIs. It appears to happen to some but not all customers.
To Reproduce
Instantiate and use any operation botbuilder sdk (send or update message for example)
Expected behavior
No error