microsoft / service-fabric

Service Fabric is a distributed systems platform for packaging, deploying, and managing stateless and stateful distributed applications and containers at large scale.
https://docs.microsoft.com/en-us/azure/service-fabric/
MIT License
3.02k stars 399 forks source link

Stateless Service startup life-cycle isn't the same as described in documentation. #709

Open OlegKarasik opened 6 years ago

OlegKarasik commented 6 years ago

Good day,

Description

I am currently working on a library project that provides a simple way to hookup custom handlers on services events. During this work I have discovered that stateless service life-cycle doesn't match the one described in the documentation page (this can be issue in docs or this can be a bug or this can be my bad).

In the documentation there is a statement:

  1. The service is constructed.
  2. Then, in parallel, two things happen:
    • StatelessService.CreateServiceInstanceListeners() is invoked and any returned listeners are opened.
    • ICommunicationListener.OpenAsync() is called on each listener.
  3. The service's StatelessService.RunAsync() method is called.
  4. If present, the service's StatelessService.OnOpenAsync() method is called. This call is an uncommon override, but it is available. Extended service initialization tasks can be started at this time.

Keep in mind that there is no ordering between the calls to create and open the listeners and RunAsync. The listeners can open before RunAsync is started. Similarly, you can invoke RunAsync before the communication listeners are open or even constructed.

Consider the following example - create a new SF application with ASP.NET Core Stateless Service.

Then in the service code files add a class WebHostWrapper that would emulate infinite execution of ICommunicationListener.OpenAsync method:

public class WebHostWrapper : IWebHost
{
    private IWebHost webHostImplementation;

    public IFeatureCollection ServerFeatures 
        => this.webHostImplementation.ServerFeatures;

    public IServiceProvider Services 
        => this.webHostImplementation.Services;

    public WebHostWrapper(
        IWebHost webHostImplementation)
    {
        this.webHostImplementation = webHostImplementation;
    }

    public void Dispose()
    {
        this.webHostImplementation.Dispose();
    }

    public void Start()
    {
        // This method is used by Kestrel server.
        Thread.Sleep(Timeout.Infinite);
    }

    public Task StartAsync(
        CancellationToken cancellationToken = new CancellationToken())
    {
        return this.webHostImplementation.StartAsync(cancellationToken);
    }

    public Task StopAsync(
        CancellationToken cancellationToken = new CancellationToken())
    {
        return this.webHostImplementation.StopAsync(cancellationToken);
    }
}

No use this class to wrap IWebHost pre-configured by project template.

internal sealed class Web1 : StatelessService
{
    public Web1(
        StatelessServiceContext context)
        : base(context)
    {
    }

    protected override IEnumerable<ServiceInstanceListener> CreateServiceInstanceListeners()
    {
        return new ServiceInstanceListener[]
        {
            new ServiceInstanceListener(
                serviceContext =>
                    new KestrelCommunicationListener(
                        serviceContext,
                        "ServiceEndpoint",
                        (
                            url,
                            listener) =>
                        {
                            var w = new WebHostBuilder()
                               .UseKestrel()
                               .ConfigureServices(
                                    services => services
                                       .AddSingleton<StatelessServiceContext>(serviceContext))
                               .UseContentRoot(Directory.GetCurrentDirectory())
                               .UseStartup<Startup>()
                               .UseServiceFabricIntegration(
                                   listener, 
                                   ServiceFabricIntegrationOptions.None)
                               .UseUrls(url)
                               .Build();

                            // comment in order to test other scenarios
                            return new WebHostWrapper(w);
                            // uncomment in order to test other scenarios
                            // return w;
                        }))
        };
    }

    protected override Task RunAsync(
        CancellationToken cancellationToken)
    {
        return base.RunAsync(cancellationToken);
    }

    protected override Task OnOpenAsync(
        CancellationToken cancellationToken)
    {
        return base.OnOpenAsync(cancellationToken);
    }
}

No ideally if we set a breakpoint in RunAsync method then it should be hit because as per documentation ICommunicationListener.OpenAsync and RunAsync are happening in parallel. But if you would try to execute it then RunAsync never gets hit.

Update 2018/12/05

I have added more information about shutdown and abort cycles.

In documentation we have:

For shutting down a stateless service, the same pattern is followed, just in reverse:

  1. In parallel:
    • Any open listeners are closed. ICommunicationListener.CloseAsync() is called on each listener.
    • The cancellation token passed to RunAsync() is canceled. A check of the cancellation token's IsCancellationRequested property returns true, and if called, the token's ThrowIfCancellationRequested method throws an OperationCanceledException.
  2. After CloseAsync() finishes on each listener and RunAsync() also finishes, the service's StatelessService.OnCloseAsync() method is called, if present. OnCloseAsync is called when the stateless service instance is going to be gracefully shut down. This can occur when the service's code is being upgraded, the service instance is being moved due to load balancing, or a transient fault is detected. It is uncommon to override StatelessService.OnCloseAsync(), but it can be used to safely close resources, stop background processing, finish saving external state, or close down existing connections.
  3. After StatelessService.OnCloseAsync() finishes, the service object is destructed.

The real shutdown sequence (see Current state) doesn't involve parallelism. The behavior of current state can be easily reproduced using the gist.

Current state

Startup cycle:

  1. Service's CreateServiceInstanceListeners is called.
  2. All listeners have their OpenAsync called and awaited.
  3. Then in parallel:
    • Service's RunAsync is called.
    • Service's OpenAsync is called.

Note: The runtime waits for OpenAsync to finish before marking the replica as ready

This behavior can be easily tested by adding Thread.Sleep(Timeout.Infinite); to overrides. Also please note in order to test it using the above code don't forget to comment WebHostWrapper usage.

Shutdown cycle:

  1. All listeners have their CloseAsync method called and awaited.
  2. CancellationToken passed to Service’s RunAsync method is canceled.
  3. Service’s RunAsync method is awaited.
  4. Service’s OnCloseAsync method is called and awaited.

Abort cycle:

  1. CancellationToken passed to Service’s RunAsync method is canceled. All previously created ICommunicationListener have their Abort method executed. The methods are executed in sequence.
  2. Service’s Abort method is executed.
  3. Service’s implementation object is destroyed.
OlegKarasik commented 6 years ago

Added more information about what is the current startup life-cycle is.

mikkelhegn commented 6 years ago

@rwike77

amanbha commented 5 years ago

@OlegKarasik The documentaion is incorrect is is a documentaion issue . As per code, Communication listener is opened first https://github.com/Microsoft/service-fabric-services-and-actors-dotnet/blob/0ee6e7b8e1db55dff8f610f991a46b32237b18c2/src/Microsoft.ServiceFabric.Services/Runtime/StatelessServiceInstanceAdapter.cs#L79

OlegKarasik commented 5 years ago

@amanbha Thanks for the confirmation and for source link!

I have update issue to reflect that RunAsync isn't called and as you underlined listener.OpenAsync is called first.

OlegKarasik commented 5 years ago

~I have update the comment with the shutdown sequence information and blogged the details of stateless service life-cycle in details (leaving a link here to avoid information duplication).~

I have updated the issue with more details about shutdown and abort routines.