moby / swarmkit

A toolkit for orchestrating distributed systems at any scale. It includes primitives for node discovery, raft-based consensus, task scheduling and more.
Apache License 2.0
3.37k stars 616 forks source link

Synchronous service removal #2497

Open nishanttotla opened 6 years ago

nishanttotla commented 6 years ago

Since we support synchronous service create and update (https://github.com/moby/moby/pull/31144), it would be worthwhile to think about synchronous service removals.

One of the reasons for this is the following: Currently the service object is deleted right away, while tasks are set to desired state REMOVE, and eventually removed when the containers are stopped. During the period when shutdown is happening, it is impossible to inspect these tasks because the service object doesn't exist, which makes for bad UX and issues that are extremely difficult for the user to debug.

There are two ways to do this:

  1. (easy way): Keep the service object around when a service delete event comes in. The task reaper then while removing tasks with desired state REMOVE and actual state >=SHUTDOWN also ensures that once all tasks are gone, the service object is removed.
  2. (harder way): End to end API change, so docker service rm would actually wait for the removals to go through completely (with a possible --detached mode). This would require the above change, and more.

Either way, let's discuss whether this makes sense and what the right approach is.

cc @anshulpundir @dperny @tiborvass @stevvooe @aluzzardi @thaJeztah @marcusmartins

anshulpundir commented 6 years ago

1 Keeping the service object in the store around until the remove is complete makes sense and allows the service tasks to be inspected while the remove is in progress.

It is not clear to me what we gain by doing #2. Does this enable new use-cases ?

nishanttotla commented 6 years ago

@anshulpundir what we gain by doing 2 is that the end to end API behavior becomes more consistent with service create/update. Either way, I am guessing that the first requirement might be to do 1 anyway, whether or not we decide to change it at the API level.

Also cc @aaronlehmann

aaronlehmann commented 6 years ago

(harder way): End to end API change, so docker service rm would actually wait for the removals to go through completely (with a possible --detached mode). This would require the above change, and more.

That wouldn't really be an API change. It would be a CLI change.

nishanttotla commented 6 years ago

Here's a proposal for this change:

  1. Add a Removed field to the Service object.
  2. Instead of deleting the service when RemoveService is called in the controlapi, we set the Removed field to true. Something like
        var service *api.Service
    err := s.store.Update(func(tx store.Tx) error {
                service = store.GetService(tx, request.ServiceID)
                if service == nil {
                        return grpc.Errorf(codes.NotFound, "service %s not found", request.ServiceID)
                }
                service.Removed = true
        return store.UpdateService(tx, service)
    })
  1. In the orchestrator, instead of listening for api.EventDeleteService, we use the case for api.EventUpdateService and check for the Removed flag being set to initiate removal actions.

  2. The task reaper is responsible for finally removing the service object. If all tasks marked with desired state REMOVE and current state >=SHUTDOWN of a service marked for removal are gone, then the service object can be deleted.

Looking forward to critiques.

cc @anshulpundir @dperny @aaronlehmann @cyli @stevvooe @tiborvass

dperny commented 6 years ago

SGTM. for backward-compat, i'm pretty sure proto will consider no value to be "false", which means old services will still work on upgrade.

tiborvass commented 6 years ago

IANAM but sounds good to me

stevvooe commented 6 years ago

This looks okay.