microsoft / PowerPlatform-DataverseServiceClient

Code Replica for Microsoft.PowerPlatform.Dataverse.Client and supporting nuget packages.
MIT License
279 stars 50 forks source link

CDS client doesn't work on kubernetes #93

Open onsvejda opened 3 years ago

onsvejda commented 3 years ago

We tried using some two different approaches: 1) Having "static" CdsServiceClient with cloning (Test1Controller in attached sample) - i.e. static CdsServiceClient client = ... using(var clone = client.Clone()) { ... cds business ... } 2) Allocate client on the fly (Test2Controller in attached sample) - i.e. using(var client = new CdsServiceClient(...))

Both attempts choked up the cluster pretty quickly (load test = hitting cluster with 100 threads per second), after few seconds it was choked up.

3) Only way how we're able to overcome - to a degree - the issue was to create a pool of .Clone() clients and don't dispose them at all - which has its drawbacks as it creates management burden (handling of cases like API token is expired / connection in faulted state / etc.) and you have to dispose the client at some point in time eventually because of those

There is somehow similar issue described here - https://github.com/dotnet/wcf/issues/3344 However in our case it doesn't look like thread congestion (the measured numbers were not that high - more like a leakage of some sort). It does not reproduce on windows.

4) The issue also isn't repro when going purely via native http client / calling to ODATA.

Sample with full repro here: src.zip

MattB-msft commented 3 years ago

Thanks, We will take a look at this.. We are in process in adding Async support to this lib currently, which may help in this situation.

BetimBeja commented 3 years ago

Doing some performance tests with the latest version of the released package v0.4.4 on an azure function v3 I get the following inconsistency on the simple WhoAmIRequest image image

Implemented code is the following:

/* ServiceClientSingleton.cs */
using Microsoft.PowerPlatform.Dataverse.Client;

namespace AlbanianXrm.Functions
{
    public class ServiceClientSingleton
    {
        public ServiceClientSingleton(string connectionString)
        {
            ServiceClient = new ServiceClient(connectionString);
        }

        public ServiceClient ServiceClient { get; private set; }
    }
}
/* Startup.cs */
using AlbanianXrm.Functions;
using Microsoft.Azure.Functions.Extensions.DependencyInjection;
using Microsoft.Extensions.DependencyInjection;
using System;

[assembly: FunctionsStartup(typeof(Startup))]
namespace AlbanianXrm.Functions
{
    public class Startup : FunctionsStartup
    {
        public override void Configure(IFunctionsHostBuilder builder)
        {
            builder.Services.AddSingleton((s) =>
            {
                return new ServiceClientSingleton(Environment.GetEnvironmentVariable("ConnectionString"));
            });

            builder.Services.AddScoped(sp =>
            {
                return sp.GetService<ServiceClientSingleton>().ServiceClient.Clone();
            });
        }
    }
}
/* WhoAmI.cs */
using System.Threading.Tasks;
using Microsoft.AspNetCore.Mvc;
using Microsoft.Azure.WebJobs;
using Microsoft.Azure.WebJobs.Extensions.Http;
using Microsoft.AspNetCore.Http;
using Microsoft.Extensions.Logging;
using Microsoft.PowerPlatform.Dataverse.Client;
using Microsoft.Crm.Sdk.Messages;
using System;
using System.Diagnostics;

namespace AlbanianXrm.Functions
{
    public class WhoAmI
    {
        private readonly ServiceClient _ServiceClient;

        public WhoAmI(ServiceClient serviceClient)
        {
            _ServiceClient = serviceClient;
        }

        [FunctionName("WhoAmI")]
        public async Task<IActionResult> Run(
            [HttpTrigger(AuthorizationLevel.Function, "get", "post", Route = null)] HttpRequest req,
            ILogger log)
        {
            Stopwatch stopwatch = new Stopwatch();
            stopwatch.Start();
            log.LogInformation("Starting function {0} ticks", stopwatch.ElapsedTicks);
            try
            {
                var responseMessage = (await _ServiceClient.ExecuteAsync(new WhoAmIRequest())) as WhoAmIResponse;
                log.LogInformation("Response from Dataverse in {0} ticks", stopwatch.ElapsedTicks);
                return new OkObjectResult("Your application user id in Microsoft Dataverse is: " + responseMessage.UserId);
            }
            catch(Exception ex)
            {
                log.LogError(ex.Message);
                return new BadRequestObjectResult(ex);
            }        
        }
    }
}

I did a stress test using Apache JMeter with the following results: image

MattB-msft commented 3 years ago

@BetimBeja all the Execute commands run though our longstanding API interface which the CdsServiceClient uses. Can you tell me the instance/orgid that your connecting to so we can look at the other end of this to see where these requests went? And or can you provide the verbose logs for there requests? you can get them out of the in memory logger in the client and write them to a file that you can get at after the run if you need to .

Thanks.

BetimBeja commented 3 years ago

Environment ID: bb8ace81-e434-4238-b254-5bda52e9c5b6 image I will try to update the logs collection this weekend and post them here. It is a trial environment used for learning purposes 😄

MattB-msft commented 3 years ago

Sorry about the delay getting back to you here.. there was a bit of a long discussion around this. We do now understand the issue that is causing the inconsistent performance in the API and the team is looking at how to address this in the longer term. It is not a short term fix.

tagging @JimDaly here.

BetimBeja commented 3 years ago

Thank you @MattB-msft, I am sorry I wasn't able to provide any more logs, I have been busy lately and have had very little time for this.

MattB-msft commented 3 years ago

@BetimBeja as a heads up... we discovered and fixed number of issues that could have been impacting this in the ServiceClient, and a rather big issue with the way MSAL deals with cache locking we are working though actively.

However many of the updates we have recently provided may substantially improve the perf on Kubernetes, if you get the chance, can you retest your scenario?

BetimBeja commented 3 years ago

@MattB-msft I will try to replicate the test this weekend! I will email you the details of the test!

MattB-msft commented 2 years ago

@BetimBeja were you successfull?

BetimBeja commented 2 years ago

@MattB-msft I sent the email with subject "Stress-Test Azure Function v3 DV ServiceClient" to mattb-msft@hotmail.com as in your github profile, I will try to test again since a lot of time has passed.

MattB-msft commented 2 years ago

Ah, sorry about that. Will check to see if I still have it .

Thanks, MattB

BetimBeja commented 2 years ago

Just sent an update 😄