microsoftgraph / microsoft-graph-comms-samples

Microsoft Graph Communications Samples
MIT License
211 stars 236 forks source link

EchoBot throws exceptions during the initialization and terminates the call on joining. #764

Open genemgh opened 2 months ago

genemgh commented 2 months ago

We try to run the EchoBot example (GitHub: microsoftgraph/microsoft-graph-comms-samples/Samples/PublicSamples/EchoBot/). We have a Microsoft 365 Developer sandbox as a part of Visual Studio subscription, and a tenant in the sandbox. We use Postman to hit the bot.

Versioning:

Ngrok:

Certificates:

Graph Application and Azure Bot registrations:

Local environment:

When we run the bot from Visual Studio in the debug mode, during the bot initialization, the following 12 different types of exceptions are logged:

  1. onecore\net\netprofiles\service\src\nsp\dll\namespaceserviceprovider.cpp(613)\nlansp_c.dll!00007FFF45F1F6BD: (caller: 00007FFF619FACF6) LogHr(1) tid(59e8) 8007277C No such service is known. The service cannot be found in the specified name space. (3)
  2. Microsoft C++ exception: std::system_error at memory location 0x00000050203BA850.
  3. Exception thrown at 0x00007FFF5FE1FABC (KernelBase.dll) in EchoBot.exe: WinRT originate error - 0x80040155 : 'Failed to find proxy registration for IID: {79EAC9E4-BAF9-11CE-8C82-00AA004BA90B}.'.
  4. Exception thrown at 0x00007FFF5FE1FABC (KernelBase.dll) in EchoBot.exe: 0x80040155: Interface not registered. (2)
  5. MediaPerf is not registered: no key found at SYSTEM\CurrentControlSet\Services\MediaPerf\Performance
  6. Exception thrown at 0x00007FFF5FE1FABC (KernelBase.dll) in EchoBot.exe: 0x000006D9: There are no more endpoints available from the endpoint mapper.
  7. Exception thrown at 0x00007FFF5FE1FABC (KernelBase.dll) in EchoBot.exe: 0x000006BA: The RPC server is unavailable. (3)
  8. 'System.InvalidOperationException' in System.Diagnostics.PerformanceCounter.dll: Category does not exist.
  9. 'System.IO.IOException' in System.Net.Sockets.dll, System.Net.Security.dll, System.Private.CoreLib.dll
  10. 'System.InvalidOperationException' in Unity.Container.dll. No public constructor is available for type Microsoft.Extensions.Options.IPostConfigureOptions`1[Microsoft.Extensions.Logging.LoggerFilterOptions] (and for other interfaces, about 60 of them). Inner Exception: InvalidRegistrationException: Exception of type 'Unity.Exceptions.InvalidRegistrationException' was thrown.
  11. System.IO.IOException in System.Net.Security.dll, System.Private.CoreLib.dll. Received an unexpected EOF or 0 bytes from the transport stream. (5)
  12. System.IO.IOException: 'Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host..'

But despite of all the exceptions, the bot starts: Microsoft.Hosting.Lifetime: Information: Now listening on: http://[::]:9442 Microsoft.Extensions.Hosting.Internal.Host: Debug: Hosting started. And the GET health check request from Postman hits Ngrok, hits the bot, and returns 200 OK to the Postman. During the bot initialization, the BotMediaStream is created, the audio socket is initialized, the bot sends status active for media, and the media player is created.

Testing;

The problem:
It tries to establish the call: 17:01:28:954 CallHandler: Call status updated to Establishing. And then immediately terminates it: 17:01:29:954 CallHandler: Call status updated to Terminated - Server Internal Error. DiagCode: 500#1203002.@. And then throws the following exceptions:
Exception thrown: 'System.Threading.Tasks.TaskCanceledException' in System.Private.CoreLib.dll A task was canceled. Exception thrown: 'System.IO.IOException' in System.Net.Sockets.dll Unable to read data from the transport connection: The I/O operation has been aborted because of either a thread exit or an application request. THE BOT DOES NOT APPEAR IN THE MEETING, and NO MEDIA IS SENT TO THE BOT.

Questions:

  1. All the exceptions are triggered deeply in the native libraries. Can it mean that there are some versioning issues?
  2. What can cause the exceptions in the bot during the initialization? Are those exceptions significant?
  3. Can those exceptions cause that the bot is not joining the call and also the media stream issues?
  4. Is it OK to use a self-signed certificate?
  5. What would you recommend to further debug the issue?

Thank you!

1fabi0 commented 2 months ago

I'll start with question 4 it is not ok to use a self signed certificate. Regarding the other questions probably they resolve themself if you use a certificate with a certificate to the root chain. Maybe as tip with ngrok try to get a connection to 0.tcp.-endpoint of ngrok by restarting on the other endpoints there sometimes seem to be problems with the firewall settings from ngrok.

genemgh commented 2 months ago

Thank you for the advice! We'll try a root chain certificate.

Our Ngrok is configured to generate the following two endpoints: TCP: tcp://N.tcp.ngrok.io:PORT -> localhost:8445, N = 0,2,4.6,8, PORT is a random int usually between 10000 and 30000. HTTP: https://DOMAIN.ngrok-free.app -> http://localhost:9442, DOMAIN is a random string like d7b9-63-209-137-19

We generate the certificates for domain N.tcp.ngrok.io, the TCP connection domain, no wildcard *. Is that correct?

=========================================================

Another possible reason of the issue: Our application registration and the Azure bot registration technically belong to two different tenants.

We have a sandbox available as a part of our Microsoft 365 Developer Subscription for Visual Studio Professional. So, the application is registered in the sandbox tenant (and it's multi-tenant). The Azure bot registration is a part of our organization tenant (but it's based on the application registration, and it's also multi-tenant). That's because the real Azure subscription is under our organization tenant as a part of 365 Subscription.

We run the Teams client from our sandbox tenant. During the call joining process, the bot does not ask for any admin consent. Can it (the registration in two different tenants) be the reason of the call termination?

=========================================================

And would anybody please confirm that it's supposed to work for .NET 6.0 :-)

Thank you!

1fabi0 commented 2 months ago

I think your tenant situation is not a problem as long as you gave admin permission for the teams operating tenant. Regarding the certificate for the tcp ngrok address, as it might be rather complicated to get a valid certificate for the ngrok domain you can use a cname on a custom domain that then points to e.g. 0.tcp.ngrok.io and get the certificate for your custom domain(this implies using the custom domain instead of the ngrok domain in the bot).

Yes .Net 6 works, but hosting with IIS doesn't

genemgh commented 2 months ago

Thank you for your help! We'll try it.

A deployment question. Does it have to be deployed in Azure VMSS? Is it possible to deploy it on-premises on a Windows Server in Kubernetes?

1fabi0 commented 2 months ago

Yes, it is possible to deploy it on-premises in a Kubernetes cluster, but keep in mind it is not officially supported by Microsoft. It is recommended to run the bot as close as possible to the Azure datacenter location where the teams meeting is hosted to reduce packet loss and roundtrip times.

However regarding Kubernetes, you can have a look at the Azure Kubernetes Service Sample that I also worked on. The k8s charts and deployment steps should be compatible/portable.

genemgh commented 2 months ago

In our case, we'll have to redirect the media stream to our on-premises platform for processing and analysis anyway, so the distance is not very important. Anyway, thank you for the advice, we'll consider it.

genemgh commented 2 months ago

BTW, is it possible to use HTTPS for the media stream instead of TCP? Web Sockets, for example? Thank you!

1fabi0 commented 2 months ago

No, it seems like the media platform is somewhere based on WCF nowadays core WCF so you can not replace the TCP endpoint with websockets, hopefully Microsoft will refactor the media platform one day and pay some technical debt and remove the dependency to WCF.

genemgh commented 1 month ago

Finally, we have our root chain certificate issued and the DNS cname records added. It works now (in the non-debug mode). Thank you very much for your help!

In the debug mode it triggers the same exceptions. Is it supposed to work in the debug mode? It would provide a lot of useful information...

Does the audio stream provide the participant diarization?

Thank you!