protobuf-net / protobuf-net.Grpc

GRPC bindings for protobuf-net and grpc-dotnet
Other
857 stars 109 forks source link

MapGrpcService() called over multiple grpc services which are part of same dot net service, these calls cause startup delay #324

Open sghatbale opened 7 months ago

sghatbale commented 7 months ago

Hello Marc,

Below is the extracted code of startup of c# dot net service. As you can see MapGrpcService() is called for each of services file.

There are many grpc services files. MapGrpcService() for all of these services sometimes combinedly takes 20 seconds.

We are starting grpc service as windows service. And windows service timeout is 30 seconds. This slowness of MapGrpcService() is really causing the service startup timeout.

How to speed up MapGrpcService() OR is there any other alternative??

..... builder.Services.AddControllers(); var app = builder.Build();

app.MapGrpcService\<Service1>(); app.MapGrpcService\<Service2>(); app.MapGrpcService\<Service3>(); app.MapGrpcService\<Service4>(); app.MapGrpcService\<Service5>(); app.MapGrpcService\<Service6>(); app.MapGrpcService\<Service7>(); app.MapGrpcService\<Service8>(); ...... app.MapGrpcService\<Service35>();

app.MapGet("/", () => "Communication with gRPC endpoints must be made through a gRPC client. To learn how to create a client, visit: https://go.microsoft.com/fwlink/?linkid=2086909");

app.Run(Url);

mgravell commented 7 months ago

Question: are you using protobuf-net.Grpc here? Meaning: are you using the code-first feature here? It isn't clear from the example, and I want to be 100% sure I understand before I dig.

But what I am hearing: your web app has a lot of gRPC services (can we get a number on the number of services and methods, so we can understand?), and the setup is taking too long. I agree 20+ seconds is way too long here.

sghatbale commented 7 months ago

@mgravell One of the service file looks like below

public class Service1 : IService1 { private readonly Logger logger; public Service1(Logger logger) { this.logger = logger; } public ValueTask getDataAccessNodes(getDataRequest request, CallContext context = default) { try { UInt32 outputSize = request.outputSize; Int32 retVal = request.retVal; return new ValueTask( new getDataResponse { grpcRetVal = foo(request.ccid,request.clid) ,outputSize = outputSize,retVal = retVal }); } catch (Exception ex) { logger.LogError("Exception message - " + ex.Message); logger.LogException(ex);

        return new ValueTask<getDataResponse>(
                new getDataResponse
                {
                    responseErrorCode = -1,
                    responseErrorMessage = ex.Message
                });
    }
}

}

Interface class has below code

[ProtoContract(SkipConstructor = true)] public class getDataRequest { [ProtoMember(1)] public UInt32 ccid { get; set; } [ProtoMember(2)] public UInt32 clid { get; set; } } [ProtoContract(SkipConstructor = true)] public class getDataResponse {
[ProtoMember(1)] public String grpcRetVal { get; set; }
[ProtoMember(2)] public UInt32 outputSize { get; set; }
} [ServiceContract]
public interface IService1 { [OperationContract] public ValueTask getDataAccessNodes (getDataRequest request, CallContext context = default); }

sghatbale commented 7 months ago

Yes @mgravell ,

using protobuf-net.Grpc , protobuf-net.Grpc.AspNetCore

image

Builder has following code

builder.Services.AddCodeFirstGrpc(options => { options.EnableDetailedErrors = true; options.MaxReceiveMessageSize = int.MaxValue; options.MaxSendMessageSize = int.MaxValue;
});

There are 35 services files. On Average each service has 6 to 10 methods. Max method a service has 20 methods. Usually the service having high number of methods take higher time for MapGrpcService()

sghatbale commented 7 months ago

Hi Marc,

Let me know if you need any other info.

mgravell commented 7 months ago

That's interesting, thanks. I'll see if I can setup a repro with some dummy services - then we can see what needs optimizing. This is way, way too slow.

mgravell commented 7 months ago

OK, so: I set up a test rig with 100 services, 6 methods per service, each using different data contracts (albeit dummy generated ones); entire test takes < 450ms; from the logging output, I can see that this does touch the models:

Init: 26ms
ProtoBuf.Grpc.Server.ServicesExtensions.CodeFirstServiceMethodProvider:RPC services being provided by protobuf_net.Grpc.Test.Issues.Issue324+TestService0: 6
...
ProtoBuf.Grpc.Server.ServicesExtensions.CodeFirstServiceMethodProvider:RPC services being provided by protobuf_net.Grpc.Test.Issues.Issue324+TestService99: 6
UseEndpoints: 0ms
Build: 0ms

so: whatever is happening isn't being trivial to discover; I'll need to see if I can set up a real test app and push the same through there.

sghatbale commented 7 months ago

Sure @mgravell , please let me know your findings with real test app.

menaheme commented 7 months ago

Maybe add long dummy delays in the model, service ctors

On Thu, 8 Feb 2024, 13:05 sghatbale, @.***> wrote:

Sure @mgravell https://github.com/mgravell , please let me know your findings with real test app.

— Reply to this email directly, view it on GitHub https://github.com/protobuf-net/protobuf-net.Grpc/issues/324#issuecomment-1933842355, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABLOIHYPSXVXF7HFZ7GQT5TYSSWODAVCNFSM6AAAAABC6N564OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMZTHA2DEMZVGU . You are receiving this because you are subscribed to this thread.Message ID: @.***>

mgravell commented 7 months ago

Hacked in, with ILogger args - worked just fine, no delays. Do you have a minimal runnable repro of what is required to see this being slow?

sghatbale commented 7 months ago

@mgravell

I've following code in startup

var timer = new System.Diagnostics.Stopwatch(); timer.Start();

app.MapGrpcService\<EvSecurityCheckManagedCppSharpService>(); timer.Stop(); logger.LogError($"time elapsed: [{timer.ElapsedMilliseconds}]ms"); timer.Restart();

app.MapGrpcService\<EvSecurityNewManagedCppSharpService>(); timer.Stop(); logger.LogError($"time elapsed: [{timer.ElapsedMilliseconds}]ms");

And here's logs:

Attempt1 5828 1 02/08 21:01:25 1 ### Main - time elapsed: [410]ms 5828 1 02/08 21:01:25 1 ### Main - time elapsed: [45]ms

Attempt2 9936 1 02/08 21:04:33 1 ### Main - time elapsed: [452]ms 9936 1 02/08 21:04:33 1 ### Main - time elapsed: [42]ms

Attemp3 8988 1 02/08 21:04:59 1 ### Main - time elapsed: [462]ms 8988 1 02/08 21:04:59 1 ### Main - time elapsed: [61]ms

As you can see first service alone takes 450 ms every time for MapGrpcService().

PFA exact service file code. *Contract.cs has interface and request, response object defined.

I hope this data will help further to debug. 450 ms for a service is too high.

EvSecurityCheckManagedCppSharpContract.txt EvSecurityCheckManagedCppSharpService.txt EvSecurityNewManagedCppSharpContract.txt EvSecurityNewManagedCppSharpService.txt

sghatbale commented 7 months ago

@mgravell Did you get a chance to check EvSecurityCheckManagedCppSharpService from above attachment. Essentially for this service, MapGrpcService() takes 450+ ms every time.

I'm looking for either optimization of MapGrpcService() to reduce delay OR is there a way which allows calling MapGrpcService() to be done post startup of the service.

mgravell commented 7 months ago

No, I haven't had chance to look yet - this isn't my full time job, and I'd already spent a bunch of time looking yesterday. I'll try to take another look, but I have 47 other things need attention :)

sghatbale commented 7 months ago

I appreciate your efforts @mgravell . We had interacted in the past as well.

Please let me know your findings.

sghatbale commented 7 months ago

@mgravell FYI, one more thing I've observed is that looks like MapGrpcService() is compute intensive i.e. app.MapGrpcService() takes upto 1.5 seconds if ran on machine with CPU already occupied 40% vs it takes 450 ms on idle systems.

mgravell commented 7 months ago

I have a sample using the 4 files above (I had to delete the internals and re-implement the interfaces to all throw new NotImplementedException();, but that doesn't matter since we're not actually invoking anything!)

Based on my local measurement, I get very different numbers to you:

dbug: Grpc.AspNetCore.Server.Model.Internal.BinderServiceMethodProvider[1]
      Could not find bind method for CVGrpcCppSharpService.EvSecurityCheckManagedCppSharpService.
info: ProtoBuf.Grpc.Server.ServicesExtensions.CodeFirstServiceMethodProvider[0]
      RPC services being provided by CVGrpcCppSharpService.EvSecurityCheckManagedCppSharpService: 60
1: 98ms
dbug: Grpc.AspNetCore.Server.Model.Internal.BinderServiceMethodProvider[1]
      Could not find bind method for CVGrpcCppSharpService.EvSecurityNewManagedCppSharpService.
info: ProtoBuf.Grpc.Server.ServicesExtensions.CodeFirstServiceMethodProvider[0]
      RPC services being provided by CVGrpcCppSharpService.EvSecurityNewManagedCppSharpService: 12
2: 18ms

So: <100ms for the first service, and <20ms for the second. I'm still of the opinion that this is too slow, and I'm happy to try to make it faster, but: so that I understand, can I get more information on your setup? In particular: what framework / version, operating system and CPU are you using? Obviously if you're on a low power machine, it will be slower, but I'm not trying to argue "this is OK".

One of the options I'm particularly interested in: is it time that we used AOT to generate the binding code here? if we did that, there might be nothing huge for the service code to do. I'll see if I can generate that manually for now, to see what happens.

mgravell commented 7 months ago

AOT results:

dbug: Grpc.AspNetCore.Server.Model.Internal.BinderServiceMethodProvider[1]
      Could not find bind method for CVGrpcCppSharpService.EvSecurityCheckManagedCppSharpService.
info: ProtoBuf.Grpc.Server.ServicesExtensions.CodeFirstServiceMethodProvider[0]
      RPC services being provided by CVGrpcCppSharpService.EvSecurityCheckManagedCppSharpService: 999
1: 61ms
dbug: Grpc.AspNetCore.Server.Model.Internal.BinderServiceMethodProvider[1]
      Could not find bind method for CVGrpcCppSharpService.EvSecurityNewManagedCppSharpService.
info: ProtoBuf.Grpc.Server.ServicesExtensions.CodeFirstServiceMethodProvider[0]
      RPC services being provided by CVGrpcCppSharpService.EvSecurityNewManagedCppSharpService: 999
2: 11ms

(ignore the 999 - that's placeholder; this is manually implemented "what we would do with AOT" - the actual AOT work needs many hours of effort)

So: that doubles the performance, and that removes all of the protobuf-net reflection code - everything else is most likely not protobuf-net.Grpc overheads. To say more, I'd need to get a repro of it actually running horribly slowly.

sghatbale commented 7 months ago

@mgravell "what framework / version, operating system and CPU are you using" We have dot net core 8.0 Windows 2022 Server with 16 GB RAM & 4 cores.

Are you looking for any additional info from me? I did not understand when you say "To say more, I'd need to get a repro of it actually running horribly slowly."

Can we get the update which is reducing from 98 ms to 61 ms?

mgravell commented 7 months ago

I mean: I can't seem to make it run as slowly as you're seeing - are these relatively low performance CPUs? (Nothing wrong with that - modest scale is often fine for servers, just: my mid-range laptop is performing very differently)

The 98ms to 61ms is an experimental branch that depends on a lot of code that doesn't exist yet - basically, the idea is that a Roslyn generator spits out a ton of code at build time. To see how it performs, I've done that manually in a branch, but the "real" version requires me to actually write the generator, not just the "what would we generate?"

I can start on that, but it is going to take a little time.

On Tue, 13 Feb 2024, 15:41 sghatbale, @.***> wrote:

@mgravell https://github.com/mgravell "what framework / version, operating system and CPU are you using" We have dot net core 8.0 Windows 2022 Server with 16 GB RAM & 4 cores.

Are you looking for any additional info from me? I did not understand when you say "To say more, I'd need to get a repro of it actually running horribly slowly."

Can we get the update which is reducing from 98 ms to 61 ms?

— Reply to this email directly, view it on GitHub https://github.com/protobuf-net/protobuf-net.Grpc/issues/324#issuecomment-1941816947 or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAEHMB6PL2PUUG5NS76SSTYTOCS3BFKMF2HI4TJMJ2XIZLTSOBKK5TBNR2WLJDUOJ2WLJDOMFWWLO3UNBZGKYLEL5YGC4TUNFRWS4DBNZ2F6YLDORUXM2LUPGBKK5TBNR2WLJDUOJ2WLJDOMFWWLLTXMF2GG2C7MFRXI2LWNF2HTAVFOZQWY5LFUVUXG43VMWSG4YLNMWVXI2DSMVQWIX3UPFYGLLDTOVRGUZLDORPXI6LQMWWES43TOVSUG33NNVSW45FGORXXA2LDOOJIFJDUPFYGLKTSMVYG643JORXXE6NFOZQWY5LFVEYTSMRSGM2DQOJXQKSHI6LQMWSWS43TOVS2K5TBNR2WLKRSGEZDGNZXGM2DGNFHORZGSZ3HMVZKMY3SMVQXIZI . You are receiving this email because you were mentioned.

Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub .

sghatbale commented 7 months ago

Thanks @mgravell !!! Really appreciate your efforts.

Can you please update once the experimental code is official. I guess that will help us.

mgravell commented 7 months ago

It would very much help to know what patterns your APIs use. In the code I've seen, everything was (IIRC) "takes TRequest and CancellationToken, returns ValueTask<TResponse>" - does that cover all of your usage? Basically, it changes how quickly I can get a preview that covers your specific usage, even if it doesn't handle every other possible allowed usage.

sghatbale commented 7 months ago

Yes @mgravell ,

All the services use same pattern, i.e. takes TRequest and CallContext [We do no take CancellationToken, we take CallContext], returns ValueTask

Posting one such example method =>


public ValueTask < canAdminLibraryResponse > canAdminLibrary(canAdminLibraryRequest request, CallContext context = default) { // Some business logic code try { return new ValueTask < canAdminLibraryResponse > ( new canAdminLibraryResponse { grpcRetVal = obj.canAdminLibrary(request.userId,request.commCellId,request.libraryId)

            ,m_EvSecurityCheckExtern = obj.getObjectPtr()
            ,isDisposed = obj.getIsDisposed()
                });
    }
    catch (Exception ex)
    {
       // Some business logic code
        return new ValueTask<canAdminLibraryResponse>(
                new canAdminLibraryResponse
                {
                    isDisposed = true,
                    responseErrorCode = -1,
                    responseErrorMessage = ex.Message
                });
    }
}
</code> 
sghatbale commented 7 months ago

Hi @mgravell

Is your experimental code is official (which improves the performance by reducing time taken by MapGrpcService() ) now?

mgravell commented 7 months ago

No, I would have said.

I have lots of demands on my time, and ultimately this is stuff I do that comes out of my own/family time. It is on my list of things.

sghatbale commented 5 months ago

Hi @mgravell ,

Just checking if you were able to submit the changes.

shivkumarghatbale commented 3 months ago

hi @mgravell ,

It would be really helpful if you can put your experimental code is official (which improves the performance by reducing time taken by MapGrpcService() ) .

Please update this thread.