microsoftgraph / microsoft-graph-comms-samples

Microsoft Graph Communications Samples
MIT License
210 stars 235 forks source link

Call can't get established - Terminated #721

Open MarceloSchoen opened 7 months ago

MarceloSchoen commented 7 months ago

Hello, trying to get my bot into a meeting. I get all status working just fine, when it comes to actually join, it fails.

It goes from state "Establishing" to "Terminated" (checking on my ngrok dashboard: 127.0.0.1:4040) Error code 3002.

There is no documentation or help on this regard. How can I find and fix the source of the issue?

InDieTasten commented 7 months ago

You might want to look at

Sounds like the same problem. The solution provided there might help you as well. Otherwise, you'll have to provide more log details

MarceloSchoen commented 7 months ago

You might want to look at

Sounds like the same problem. The solution provided there might help you as well. Otherwise, you'll have to provide more log details

There is no log anywhere other than that Graph Error. The info on that other post you mentioned implies a certificate error. Which I strongly believe is not the root cause ( at least initially ). Without a valid certificate the call process wouldn't even start. Certificate is validated/loaded when the MediaChannel gets attached/created to the incoming/outgoing call.

InDieTasten commented 7 months ago

You should be able to retrieve Call IDs / Call Chain IDs / Scenario IDs and timestamps as well. Then some of the Microsofties reading the issue can check their logs as well.

Also, the linked issue mentions the error during Establishing and going straight to Terminated, which sounds pretty close to what you are describing, so I wouldn't outright trust the certificate. You could also provide info on your setup, which of the samples you've used and so on.

The more info you provide, the more likely it is someone can look at it and identify the problem.

MarceloSchoen commented 7 months ago

You should be able to retrieve Call IDs / Call Chain IDs / Scenario IDs and timestamps as well. Then some of the Microsofties reading the issue can check their logs as well.

Also, the linked issue mentions the error during Establishing and going straight to Terminated, which sounds pretty close to what you are describing, so I wouldn't outright trust the certificate. You could also provide info on your setup, which of the samples you've used and so on.

The more info you provide, the more likely it is someone can look at it and identify the problem.

Thanks for your patience on replying. Maybe the root cause can be easily found if there is a place, doc, url, article, or whatever, that tells me what kind of certificate, I need to have to use when creating the MediaPlatformInstanceSettings.

If you have that information, kindly share, is there a certificate I can generate myself that works for local testing?

Kindly provide all possible details you remember on this, if you don't mind. If not, then kindly point me to the right direction. This whole thing is too abstract, poorly documented and even if you follow a tutorial from start to finish doing exactly what the person is broadcasting, it does not work.

InDieTasten commented 7 months ago

Requirements:

[!Note]
I'm currently the lead maintainer on one of the Public samples soon to be merged into this repo. We are currently working hard on docs (https://github.com/LM-Development/aks-sample/pull/55). If you are struggling, maybe you can find some good info over there as well. Unfortunately, we haven't documented the local setup and usage of Let's encrypt yet.

Javeria-Arif commented 7 months ago

Hi @MarceloSchoen . I faced the exact same issue some time ago. Try to run the same App on another machine

MarceloSchoen commented 7 months ago

My most detailed info on this { "@odata.type": "#microsoft.graph.commsNotifications", "value": [ { "@odata.type": "#microsoft.graph.commsNotification", "changeType": "updated", "resource": "/app/calls/811f5100-686a-45c5-a7be-b355b5d3d847", "resourceUrl": "/communications/calls/811f5100-686a-45c5-a7be-b355b5d3d847", "resourceData": { "@odata.type": "#microsoft.graph.call", "state": "establishing", "chatInfo": { "@odata.type": "#microsoft.graph.chatInfo", "threadId": "19:meeting_ZWI2MDM0NjMtOTg0My00NGU3LThmOWQtMDYyYzIyZGI0NWQz@thread.v2", "messageId": "0" }, "meetingInfo": { "@odata.type": "#microsoft.graph.organizerMeetingInfo", "organizer": { "@odata.type": "#microsoft.graph.identitySet", "user": { "@odata.type": "#microsoft.graph.identity", "id": "ceff7fa8-b5a8-4776-afca-63b07fe42956", "tenantId": "d8bde65a-3ded-4346-9518-670204e6e184" } } }, "callChainId": "e7c65b69-98aa-4563-b300-0ff2fa0de413" } } ] }

here call is 'establishing'

{ "@odata.type": "#microsoft.graph.commsNotifications", "value": [ { "@odata.type": "#microsoft.graph.commsNotification", "changeType": "deleted", "resource": "/app/calls/811f5100-686a-45c5-a7be-b355b5d3d847", "resourceUrl": "/communications/calls/811f5100-686a-45c5-a7be-b355b5d3d847", "resourceData": { "@odata.type": "#microsoft.graph.call", "state": "terminated", "resultInfo": { "@odata.type": "#microsoft.graph.resultInfo", "code": 500, "subcode": 3002, "message": "Server Internal Error. DiagCode: 500#3002.@" }, "chatInfo": { "@odata.type": "#microsoft.graph.chatInfo", "threadId": "19:meeting_ZWI2MDM0NjMtOTg0My00NGU3LThmOWQtMDYyYzIyZGI0NWQz@thread.v2", "messageId": "0" }, "meetingInfo": { "@odata.type": "#microsoft.graph.organizerMeetingInfo", "organizer": { "@odata.type": "#microsoft.graph.identitySet", "user": { "@odata.type": "#microsoft.graph.identity", "id": "ceff7fa8-b5a8-4776-afca-63b07fe42956", "tenantId": "d8bde65a-3ded-4346-9518-670204e6e184" } } }, "callChainId": "e7c65b69-98aa-4563-b300-0ff2fa0de413" } } ] }

here is terminated

on ngrock ( 127.0.0.1:4040) I see: All Requests POST /api/calling 200 OK 48.86ms ( <<< terminated ) POST /api/calling 200 OK 1.83s ( estalishing )

my certificate: E = stephie@stefanini.com CN = *.stephie.com.br OU = TI O = Stefanini L = Sao Paulo S = Sao Paulo C = BR

Public Key, RSA 2048 bits The thing that called my attention, and, by information provided here, the certificate must be ok ( Assuming is a trusted certificate, tell me if I'm wrong on my assumption ) Current status: This CA Root certificate is not trusted because it is not in the Trusted Root Certification Authorities store.

I could not import to the Trusted Root Certification Authorities: The import failed because the store was read-only, the store was full, or the store did not open correctly.

leroypijlman commented 7 months ago

Hey buddy,

I had the EXACT same issue just now. The problem was NOT the certificate. Because it did work with this cert a couple of days ago. I tried regenerating the certificate many different times. Besides, if my cert is incorrect, I can't even initialize the media platform.

Anyway..

There was a mismatch in my DNS <-> Ngrok configuration. So make sure your DNS is configured correctly. The TCP forwarding address needs to match your cname record in your DNS config.

e.g. I had tcp://5.tcp.eu.ngrok.io:19416, while ngrok was forwarding for tcp://9.tcp.eu.ngrok.io:19416.

So if you look at the bigger picture, this error is probably related to microsoft's server not being able to reach your InstancePublicPort. So make sure your dns is cconfigured, the port is not being blocked by your firewall, etc.

Good luck! :)

MarceloSchoen commented 7 months ago

Hey buddy,

I had the EXACT same issue just now. The problem was NOT the certificate. Because it did work with this cert a couple of days ago. I tried regenerating the certificate many different times. Besides, if my cert is incorrect, I can't even initialize the media platform.

Anyway..

There was a mismatch in my DNS <-> Ngrok configuration. So make sure your DNS is configured correctly. The TCP forwarding address needs to match your cname record in your DNS config.

e.g. I had tcp://5.tcp.eu.ngrok.io:19416, while ngrok was forwarding for tcp://9.tcp.eu.ngrok.io:19416.

So if you look at the bigger picture, this error is probably related to microsoft's server not being able to reach your InstancePublicPort. So make sure your dns is cconfigured, the port is not being blocked by your firewall, etc.

Good luck! :)

Thanks for your reply

But, that tcp from ngrok, what do you mean by checking my dns? im trying locally, nothing blocking or anything, besides, on a real scenario that would imply that I have like a neverending running ngrok, right? because If I restart it, that port it provides changes the number. If you mean dns on my registered domain, I'll have to update the CName there ( in my case local.stephie.com.br ) then I'll be going nuts, this changes all the time and my domain provider takes about 1-2hs to apply changes.

the config ( appsettings ) do not use that, it uses the other address, in my case: "BotConfiguration": { "BotName": "*****", "AadAppId": "d35df3e1-c974-4b4a-b3cc-828b614d227a", "AadAppSecret": "******", "ServiceCname": "86e6-200-180-169-254.ngrok-free.app", "MediaServiceFQDN": "local.stephie.com.br", "ServiceDnsName": "", "CertificateThumbprint": "171a574199698646d4be5290f690902db685032a", "InstancePublicPort": 15931, "CallSignalingPort": 9441, "InstanceInternalPort": 8445, "PlaceCallEndpointUrl": "https://graph.microsoft.com/v1.0" },

the Cname matches the current ngrok instance, which properties are as bellow: Session Status online
Account Marcelo Schoen (Plan: Free)
Version 3.8.0
Region South America (sa)
Latency 30ms
Web Interface http://127.0.0.1:4040
Forwarding tcp://0.tcp.sa.ngrok.io:15931 -> localhost:8445
Forwarding https://86e6-200-180-169-254.ngrok-free.app -> https://localhost:9441 Connections ttl opn rt1 rt5 p50 p90
8 0 0.00 0.00 90.09 92.02

leroypijlman commented 7 months ago

Assuming you generated your own self signed certificate, you did this against a domain. The DNS configuration of that domain needs the following two records (Not sure if no. 2 is mandatory, but this works for me. So it doesn't hurt to have it.)

Record 1: TYPE: CNAME, HOST: local, Value: 0.tcp.sa.ngrok.io (replace 0 with the number as shown in ngrok) Record 2: TYPE CNAME, HOST: www, Value: [yourDnsName]

And yes you have to update the CNAME in your DNS records everytime your ngrok instance changes. Luckily for me it's instantaneously. But if it really takes two hours for you there's two approaches you can take.

  1. Keep firing ngrok until you get the number (0.tcp.sa.ngrok.io) [0 = number] matches the one in your CNAME
  2. You can get a paid ngrok account and configure this manually if I'm not mistaken

Don't forget update the InstancePublicPort and ServiceCName in the appsettings everytime you restart ngrok. This has cost me some headaches as well.

Edit: In a 'real scenario' (e.g. running on production), there are alternatives to ngrok. Ngrok is just one of many proxy servers. But regardless of what you use, you will have to make sure everything is forwarded correctly. Ngrok just happens to constantly assign a new address for the TCP port. But I'm pretty sure you might be able to make this static when configured properly / using the software with a paid account.

MarceloSchoen commented 7 months ago

Assuming you generated your own self signed certificate, you did this against a domain. The DNS configuration of that domain needs the following two records (Not sure if no. 2 is mandatory, but this works for me. So it doesn't hurt to have it.)

Record 1: TYPE: CNAME, HOST: local, Value: 0.tcp.sa.ngrok.io (replace 0 with the number as shown in ngrok) Record 2: TYPE CNAME, HOST: www, Value: [yourDnsName]

And yes you have to update the CNAME in your DNS records everytime your ngrok instance changes. Luckily for me it's instantaneously. But if it really takes two hours for you there's two approaches you can take.

  1. Keep firing ngrok until you get the number (0.tcp.sa.ngrok.io) [0 = number] matches the one in your CNAME
  2. You can get a paid ngrok account and configure this manually if I'm not mistaken

Don't forget update the InstancePublicPort and ServiceCName in the appsettings everytime you restart ngrok. This has cost me some headaches as well.

Edit: In a 'real scenario' (e.g. running on production), there are alternatives to ngrok. Ngrok is just one of many proxy servers. But regardless of what you use, you will have to make sure everything is forwarded correctly. Ngrok just happens to constantly assign a new address for the TCP port. But I'm pretty sure you might be able to make this static when configured properly / using the software with a paid account.

Thanks! That is quite revealing. I really did that part of setting up the cname on my domain, but in the case I'm on, the generated certificate had a lot of issues, the generated one had a public key of type ECC and this way my machine can't read the private key, I re-generated it with a type RSA and that solved that part of the issue. (certificate displays as valid)

I'll have a perpetual instance of ngrok running and probably will go for the paid service if that helps as well.

MarceloSchoen commented 7 months ago

Assuming you generated your own self signed certificate, you did this against a domain. The DNS configuration of that domain needs the following two records (Not sure if no. 2 is mandatory, but this works for me. So it doesn't hurt to have it.)

Record 1: TYPE: CNAME, HOST: local, Value: 0.tcp.sa.ngrok.io (replace 0 with the number as shown in ngrok) Record 2: TYPE CNAME, HOST: www, Value: [yourDnsName]

And yes you have to update the CNAME in your DNS records everytime your ngrok instance changes. Luckily for me it's instantaneously. But if it really takes two hours for you there's two approaches you can take.

  1. Keep firing ngrok until you get the number (0.tcp.sa.ngrok.io) [0 = number] matches the one in your CNAME
  2. You can get a paid ngrok account and configure this manually if I'm not mistaken

Don't forget update the InstancePublicPort and ServiceCName in the appsettings everytime you restart ngrok. This has cost me some headaches as well.

Edit: In a 'real scenario' (e.g. running on production), there are alternatives to ngrok. Ngrok is just one of many proxy servers. But regardless of what you use, you will have to make sure everything is forwarded correctly. Ngrok just happens to constantly assign a new address for the TCP port. But I'm pretty sure you might be able to make this static when configured properly / using the software with a paid account.

Tried it,

Also, tried opening my router port on the given ones ( 9441, 8445, and whatever ngrok displayed ), no luck, same thing, goes from establishing to terminated with no additional info.

How can a Dev solve a problem that has no explanation on what is happening, and the request has a 200 response code? Like, its fine, I terminated your request with an error, but you got a 200, OK, all fine here.

Why is it so hard to get a straight answer? Like I've said a little earlier, the whole thing is poorly documented, if you follow microsoft's guidelines on implementing this, from start to finish, you either got stuck on the first few steps, or, you do things exactly as the 'documentation' tells you to, it does not work, and you have no chance on knowing what is wrong, and where.

You get a generic error message, that has no definition, or place to look up to. Just straight up, error 3002. As if we could possibly guess what 3002 is about.

ssulzer commented 6 months ago

@MarceloSchoen Please check that the InstancePublicPort value is correct. For the most recent call attempt on 19 April 2024, I see this error in telemetry (where ServiceFQDN is the value given to MediaPlatformSettings.MediaPlatformInstanceSettings.ServiceFqdn):

Could not verify connectivity to the bot's media platform instance at ServiceFQDN:11703. Please verify that InstancePublicPort 11703 on the load balancer for ServiceFQDN is mapped to InstanceInternalPort port 8445 on the local machine. A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond. [::ffff:146.112.211.xx]:11703

In previous call attempts the error indicated that ServiceFQDN was not a valid DNS name ("No such host is known").

MarceloSchoen commented 6 months ago

@MarceloSchoen Please check that the InstancePublicPort value is correct. For the most recent call attempt on 19 April 2024, I see this error in telemetry (where ServiceFQDN is the value given to MediaPlatformSettings.MediaPlatformInstanceSettings.ServiceFqdn):

Could not verify connectivity to the bot's media platform instance at ServiceFQDN:11703. Please verify that InstancePublicPort 11703 on the load balancer for ServiceFQDN is mapped to InstanceInternalPort port 8445 on the local machine. A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond. [::ffff:146.112.211.xx]:11703

In previous call attempts the error indicated that ServiceFQDN was not a valid DNS name ("No such host is known").

Hi Stephen, I'm testing locally, as I mentioned, even tried to open my router ports. And yes, the port was the same as ngrok displayed. Your suggestion is that I create a LoadBalancer then? I believe I already have one in place on AWS. For this scenario, all of the given ports must be set on AWS side? I'm trying to be sure of this, because everytime I run ngrok the port change, same for the CName address I need to set up.

Can you kindly confirm? Ports must be set on router? Ports must be set on AWS LoadBalancer? The mapping that you mentioned, is already performed by ngrok, I don't have to do anyting, right? When it runs it maps requests on the instance port to go to my localhost port (8445)

MarceloSchoen commented 6 months ago

@MarceloSchoen Please check that the InstancePublicPort value is correct. For the most recent call attempt on 19 April 2024, I see this error in telemetry (where ServiceFQDN is the value given to MediaPlatformSettings.MediaPlatformInstanceSettings.ServiceFqdn): Could not verify connectivity to the bot's media platform instance at ServiceFQDN:11703. Please verify that InstancePublicPort 11703 on the load balancer for ServiceFQDN is mapped to InstanceInternalPort port 8445 on the local machine. A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond. [::ffff:146.112.211.xx]:11703 In previous call attempts the error indicated that ServiceFQDN was not a valid DNS name ("No such host is known").

Hi ssulzer, I'm testing locally, as I mentioned, even tried to open my router ports. And yes, the port was the same as ngrok displayed. Your suggestion is that I create a LoadBalancer then? I believe I already have one in place on AWS. For this scenario, all of the given ports must be set on AWS side? I'm trying to be sure of this, because everytime I run ngrok the port change, same for the CName address I need to set up.

Can you kindly confirm? Ports must be set on router? Ports must be set on AWS LoadBalancer? The mapping that you mentioned, is already performed by ngrok, I don't have to do anyting, right? When it runs it maps requests on the instance port to go to my localhost port (8445)

It persists { "@odata.type": "#microsoft.graph.commsNotifications", "value": [ { "@odata.type": "#microsoft.graph.commsNotification", "changeType": "deleted", "resource": "/app/calls/431fd200-a430-4fbf-a335-f31ea5725f79", "resourceUrl": "/communications/calls/431fd200-a430-4fbf-a335-f31ea5725f79", "resourceData": { "@odata.type": "#microsoft.graph.call", "state": "terminated", "resultInfo": { "@odata.type": "#microsoft.graph.resultInfo", "code": 500, "subcode": 3002, "message": "Server Internal Error. DiagCode: 500#3002.@" }, "chatInfo": { "@odata.type": "#microsoft.graph.chatInfo", "threadId": "19:meeting_ZWI2MDM0NjMtOTg0My00NGU3LThmOWQtMDYyYzIyZGI0NWQz@thread.v2", "messageId": "0" }, "meetingInfo": { "@odata.type": "#microsoft.graph.organizerMeetingInfo", "organizer": { "@odata.type": "#microsoft.graph.identitySet", "user": { "@odata.type": "#microsoft.graph.identity", "id": "ceff7fa8-b5a8-4776-afca-63b07fe42956", "tenantId": "d8bde65a-3ded-4346-9518-670204e6e184" } } }, "callChainId": "2784a1bd-09fd-412f-a665-48af0271cd60" } } ] }

image

image

How can I review the progress as you have ( telemetry ) from my end, without the need to rely on someone checking it here?

Javeria-Arif commented 6 months ago

I have started to get this error again for no reason. Everything, certificate, ngrok, local is working. But the bot is not able to join the meeting again.