opencontainers / runtime-spec

OCI Runtime Specification
http://www.opencontainers.org
Apache License 2.0
3.18k stars 539 forks source link

Replace stop with signal/kill and delete #356

Closed mikebrow closed 8 years ago

mikebrow commented 8 years ago

The spec currently states in runtime.md:

stop <container-id>

This operation MUST generate an error if it is not provided the container ID. This operation MUST stop
and delete a running container. 

Stopping a container MUST stop all of the processes running within the scope of the container. 

Deleting a container MUST delete the associated namespaces and resources associated with the container. 

Once a container is deleted, its id MAY be used by subsequent containers. 

Attempting to stop a container that is not running MUST have no effect on the container and MUST generate an error.

In a discussion with @crosbymichael https://github.com/opencontainers/runc/issues/681 it was pointed out that the spec does not explain "how" or what it means to "stop." Suggest changing the spec to remove stop and replace it with the requirement for a signal or kill command, and a delete command.
Thoughts?

wking commented 8 years ago

On Wed, Mar 23, 2016 at 11:44:38AM -0700, Mike Brown wrote:

In a discussion with @crosbymichael it was pointed out that the spec does not explain "how" or what it means to "stop." Suggest changing the spec to remove stop and replace it with the requirement for a signal or kill command, and a delete command.

Previous discussion on what stop would mean in [1,2]. Should we move this to the list (see also 3)?

 Subject: Definining "container" and "container processes" on Linux
 Date: Thu, 8 Oct 2015 09:47:08 -0700
 Message-ID: <20151008164708.GM28418@odin.tremily.us> 

 Subject: Clarify distinction between ‘stop’ and ‘delete’
 Date: Wed, 23 Mar 2016 11:49:35 -0700
 Message-ID: <20160323184935.GB23066@odin.tremily.us>
wking commented 8 years ago

There was some discussion in today's meeting 1, with the main concern being “is a signal-based approach portable?”. It looks like basic signal handling is part of ISO C 2:

The ISO C standard only requires the signal names SIGABRT, SIGFPE, SIGILL, SIGINT, SIGSEGV, and SIGTERM to be defined.

so an OCI signal command that sends those should be portable to other systems. However, signal(2) has 3:

The signals SIGKILL and SIGSTOP cannot be caught or ignored.

so I think we want the spec to have a ‘signal’ command (possibly called ‘kill’ to match POSIX 4) which also supports a an uncatchable kill (which may be SIGKILL on Linux, but another action for VM-based containers or other platforms that don't support an uncatchable SIGKILL). The uncatchable kill is @julz' “forceful stop” 5, which seems like a cross-platform idea.

I'm not sure how VM-based platforms would handle requests for SIGTERM, but I'm ok with spec wording that requires the requested ISO C signal be sent if possible, or an unsupported error be raised if impossible.

I'm not clear enough on Window's containers to know if they also support sending SIGTERM, etc. into the main container process. But the worst-case scenario (VM-level support, erroring on anything except SIGKILL requests) seems like an acceptable lower bar. @RobDolinMS may be able to add more clarity there.

I think we need to explicitly say which runtime calls trigger which signals because the non-KILL signals are a mechanism for runtime callers to communicate with their container process (e.g. SIGTERM says “please stop gracefully”). If we don't specify which signals a runtime call sends, it's hard to write portable container images (What do you listen for to trigger a graceful shutdown? Which container processes might receive that signal from the runtime?).

wking commented 8 years ago

Done in #384. I think we can close this.

mikebrow commented 8 years ago

Issue has been satisfactorily resolved by removing the stop operation and adding kill (with optional signal values for platforms that have signal support) and delete.

NetEase-FuXi commented 10 months ago

below instructions maybe valid:

runc pause <containerId>
kill <runc container PID>  // PID field from runc list
runc resume <containerId>
#at this time, runc list shows container in stopped
runc delete <containerId>