kubernetes-retired / rktlet

[EOL] The rkt implementation of the Kubernetes Container Runtime Interface
Apache License 2.0
137 stars 43 forks source link

rktlet/cli/init: set `--service-type=notify` #119

Closed schu closed 7 years ago

schu commented 7 years ago

Otherwise, an error in the run command is not detected and systemd-run returns 0 in any case. (systemd-run false ; echo $? versus systemd-run --service=notify false ; echo $?)

With this change, an error in a transient unit (as described in https://github.com/kubernetes-incubator/rktlet/issues/108) would be easier to debug, as RunPodSandbox now returns the actual error to the kubelet.

schu commented 7 years ago

It does rely on the fact that run programs actually report ready to systemd-run, which seems to be true for rkt app sandbox (or rather systemd, as run by rkt app sandbox).

Currently, StartProcess is only used from RunPodSandbox.

alban commented 7 years ago

Yes, and support for sd_notify was improved in rkt v1.15.0 so rkt handles this correctly now.

lucab commented 7 years ago

This looks a bit fishy. You are getting a generic error exit code (NOT the child one), because the process exits before signaling (violating the sd_notify protocol):

$ systemd-run --service=notify /bin/bash -c "exit 2" ; echo $?
1
alban commented 7 years ago

This looks a bit fishy. You are getting a generic error exit code (NOT the child one), because the process exits before signaling (violating the sd_notify protocol):

I am not sure what you are suggesting. rkt app sandbox should follow the sd_notify protocol, unless it fails before starting systemd-nspawn.

rkt app add probably does not follow the fd_notify protocol but do we use systemd-run for thoses? If it is the case, I guess we will notice it:

$ sudo systemd-run --service=notify /bin/bash -c "exit 0" ; echo $?
Job for run-rf6c75562d5304b1db59655ef65bf8e10.service failed because the service did not take the steps required by its unit configuration.
See "systemctl status run-rf6c75562d5304b1db59655ef65bf8e10.service" and "journalctl -xe" for details.
1
schu commented 7 years ago

I am not sure what you are suggesting. rkt app sandbox should follow the sd_notify protocol, unless it fails before starting systemd-nspawn.

rkt app sandbox could sd_notify an ERRNO, I guess.

You are getting a generic error exit code (NOT the child one), because the process exits before signaling (violating the sd_notify protocol)

Yes. While it's a generic error code, it makes StartProcess return an error message for a transient unit that fails. It could be argued that StartProcess should only care about the sucess / failure of systemd-run, but in the current use case waiting for READY=1 seems more appropriate to me (make sure the actual error is returned to the kubelet; avoid waiting for uuid file unnecessarily).

iaguis commented 7 years ago

rkt app sandbox could sd_notify an ERRNO, I guess.

Let's track this in #123 and we can merge this PR as-is.