Open pschiffe opened 7 years ago
It might be a mismatch between the version of runc and the version of oci-systemd-hook? Did those two versions ship together? How did you call out to oci-systemd-hook?
Those two packages should align correctly as both are from the latest version of rhel atomic host (7.3.6
):
# rpm -q oci-systemd-hook runc
oci-systemd-hook-0.1.7-4.gite533efa.el7.x86_64
runc-1.0.0-6.gite800860.el7.x86_64
This is how I'm calling hooks in the config.json.template:
"hooks": {
"prestart": [
{
"path": "/usr/libexec/oci/hooks.d/oci-systemd-hook"
},
{
"path": "/usr/libexec/oci/hooks.d/oci-register-machine"
}
],
"poststop": [
{
"path": "/usr/libexec/oci/hooks.d/oci-systemd-hook"
},
{
"path": "/usr/libexec/oci/hooks.d/oci-register-machine"
}
]
},
The hooks expect the first option to the hook to be the prestart/poststart. Add those arguments and it should work.
Hooks can not tell which phase they are running unless you pass in the argv[1]
Unfortunately, even with the args I see the same error:
runc[16416]: container_linux.go:259: starting container process caused "process_linux.go:345: container init caused \"process_linux.go:328: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: \\\"\""
I've updated the config.json hooks to look like this:
"hooks": {
"prestart": [
{
"path": "/usr/libexec/oci/hooks.d/oci-systemd-hook",
"args": [ "prestart" ]
},
{
"path": "/usr/libexec/oci/hooks.d/oci-register-machine",
"args": [ "prestart" ]
}
],
"poststop": [
{
"path": "/usr/libexec/oci/hooks.d/oci-systemd-hook",
"args": [ "poststop" ]
},
{
"path": "/usr/libexec/oci/hooks.d/oci-register-machine",
"args": [ "poststop" ]
}
]
},
BTW, my previous configuration (without the args) worked fine on rhel atomic host 7.3.2
.
I've tried the same image on centos atomic host continuous:
# atomic host status
State: idle
Deployments:
● centos-atomic-continuous:centos-atomic-host/7/x86_64/devel/continuous
Version: 7.2017.490 (2017-07-19 16:05:22)
Commit: a948637e77018755831659c791a8cf8595f6d267d4fddba24a6b42f0ec6f1bd7
# rpm -q oci-systemd-hook runc
oci-systemd-hook-0.1.7-4.gite533efa.el7.x86_64
runc-1.0.0-9.git6394544.el7.x86_64
But I see the same error:
runc[12188]: container_linux.go:265: starting container process caused "process_linux.go:339: container init caused \"process_linux.go:322: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: \\\"\""
@mrunalp WDYT?
@pschiffe Are you seeing anything in the journal? Are you seeing the same thing in Fedora?
Hmm, I see something more in journal:
Jul 20 12:18:03 centosah-1.c.ose-refarch.internal systemd[1]: Starting Linux Guest Environment for Google Compute Engine...
Jul 20 12:18:03 centosah-1.c.ose-refarch.internal systemd[1]: Scope libcontainer-12188-systemd-test-default-dependencies.scope has no PIDs. Refusing.
Jul 20 12:18:03 centosah-1.c.ose-refarch.internal systemd[1]: Scope libcontainer-12188-systemd-test-default-dependencies.scope has no PIDs. Refusing.
Jul 20 12:18:03 centosah-1.c.ose-refarch.internal systemd[1]: Created slice libcontainer_12188_systemd_test_default.slice.
Jul 20 12:18:03 centosah-1.c.ose-refarch.internal systemd[1]: Starting libcontainer_12188_systemd_test_default.slice.
Jul 20 12:18:03 centosah-1.c.ose-refarch.internal systemd[1]: Removed slice libcontainer_12188_systemd_test_default.slice.
Jul 20 12:18:03 centosah-1.c.ose-refarch.internal systemd[1]: Stopping libcontainer_12188_systemd_test_default.slice.
Jul 20 12:18:03 centosah-1.c.ose-refarch.internal systemd[1]: Started libcontainer container gce-agents.
Jul 20 12:18:03 centosah-1.c.ose-refarch.internal systemd[1]: Starting libcontainer container gce-agents.
Jul 20 12:18:03 centosah-1.c.ose-refarch.internal prestart[12201]: systemdhook <error>: root not found in state: Success
Jul 20 12:18:03 centosah-1.c.ose-refarch.internal systemd[1]: Stopped libcontainer container gce-agents.
Jul 20 12:18:03 centosah-1.c.ose-refarch.internal systemd[1]: Stopping libcontainer container gce-agents.
Jul 20 12:18:03 centosah-1.c.ose-refarch.internal poststop[12204]: systemdhook <error>: root not found in state: Success
Jul 20 12:18:03 centosah-1.c.ose-refarch.internal runc[12188]: container_linux.go:265: starting container process caused "process_linux.go:339: container init caused \"process_linux.go:322: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: \\\"\""
Jul 20 12:18:03 centosah-1.c.ose-refarch.internal systemd[1]: gce-agents.service: main process exited, code=exited, status=1/FAILURE
Jul 20 12:18:03 centosah-1.c.ose-refarch.internal systemd[1]: Unit gce-agents.service entered failed state.
Jul 20 12:18:03 centosah-1.c.ose-refarch.internal systemd[1]: gce-agents.service failed.
@pschiffe you should specify args as:
"prestart": [
{
"path": "/usr/libexec/oci/hooks.d/oci-systemd-hook",
"args": [ "oci-systemd-hook", "prestart" ]
},
as they are passed as they are to exec. args[0] can really be anything as it is not used.
It would be nice if it still works without any arg as it used to work before, I am taking a look right now.
Didn't help, still the same error:
Jul 20 12:50:12 centosah-1.c.ose-refarch.internal systemd[1]: Starting Linux Guest Environment for Google Compute Engine...
Jul 20 12:50:12 centosah-1.c.ose-refarch.internal systemd[1]: Scope libcontainer-12731-systemd-test-default-dependencies.scope has no PIDs. Refusing.
Jul 20 12:50:12 centosah-1.c.ose-refarch.internal systemd[1]: Scope libcontainer-12731-systemd-test-default-dependencies.scope has no PIDs. Refusing.
Jul 20 12:50:12 centosah-1.c.ose-refarch.internal systemd[1]: Created slice libcontainer_12731_systemd_test_default.slice.
Jul 20 12:50:12 centosah-1.c.ose-refarch.internal systemd[1]: Starting libcontainer_12731_systemd_test_default.slice.
Jul 20 12:50:12 centosah-1.c.ose-refarch.internal systemd[1]: Removed slice libcontainer_12731_systemd_test_default.slice.
Jul 20 12:50:12 centosah-1.c.ose-refarch.internal systemd[1]: Stopping libcontainer_12731_systemd_test_default.slice.
Jul 20 12:50:12 centosah-1.c.ose-refarch.internal systemd[1]: Started libcontainer container gce-agents.
Jul 20 12:50:12 centosah-1.c.ose-refarch.internal systemd[1]: Starting libcontainer container gce-agents.
Jul 20 12:50:12 centosah-1.c.ose-refarch.internal oci-systemd-hook[12744]: systemdhook <error>: root not found in state: Success
Jul 20 12:50:12 centosah-1.c.ose-refarch.internal systemd[1]: Stopped libcontainer container gce-agents.
Jul 20 12:50:12 centosah-1.c.ose-refarch.internal systemd[1]: Stopping libcontainer container gce-agents.
Jul 20 12:50:12 centosah-1.c.ose-refarch.internal oci-systemd-hook[12747]: systemdhook <error>: root not found in state: Success
Jul 20 12:50:12 centosah-1.c.ose-refarch.internal runc[12731]: container_linux.go:265: starting container process caused "process_linux.go:339: container init caused \"process_linux.go:322: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: \\\"\""
Jul 20 12:50:12 centosah-1.c.ose-refarch.internal systemd[1]: gce-agents.service: main process exited, code=exited, status=1/FAILURE
Jul 20 12:50:12 centosah-1.c.ose-refarch.internal systemd[1]: Unit gce-agents.service entered failed state.
Jul 20 12:50:12 centosah-1.c.ose-refarch.internal systemd[1]: gce-agents.service failed.
Jul 20 12:50:12 centosah-1.c.ose-refarch.internal systemd[1]: gce-agents.service holdoff time over, scheduling restart.
Jul 20 12:50:12 centosah-1.c.ose-refarch.internal systemd[1]: start request repeated too quickly for gce-agents.service
Jul 20 12:50:12 centosah-1.c.ose-refarch.internal systemd[1]: Failed to start Linux Guest Environment for Google Compute Engine.
Jul 20 12:50:12 centosah-1.c.ose-refarch.internal systemd[1]: Unit gce-agents.service entered failed state.
Jul 20 12:50:12 centosah-1.c.ose-refarch.internal systemd[1]: gce-agents.service failed.
"hooks": {
"prestart": [
{
"path": "/usr/libexec/oci/hooks.d/oci-systemd-hook",
"args": [ "oci-systemd-hook", "prestart" ]
},
{
"path": "/usr/libexec/oci/hooks.d/oci-register-machine",
"args": [ "oci-register-machine", "prestart" ]
}
],
"poststop": [
{
"path": "/usr/libexec/oci/hooks.d/oci-systemd-hook",
"args": [ "oci-systemd-hook", "poststop" ]
},
{
"path": "/usr/libexec/oci/hooks.d/oci-register-machine",
"args": [ "oci-register-machine", "poststop" ]
}
]
},
This means that the config file you generated does not indicate with the "root" is, oci-systemd-hook and oci-umount need to go to the root of the container to mount or umount content, can you add this to your config?
@giuseppe we can not rely on the fact that he pid=0 or not, since runc specifies three ways to run a hook. prestart poststart poststop We can't tell the difference betwee a prestart and a poststart call. Since we are building this feature into cri-o, we don't want to ignore the specification.
@rhatdan do you have an example how to specify the "root"? I can't find what you mean. I have in the config the following:
"root": {
"path": "rootfs",
"readonly": true
},
and in the process section, there is "cwd": "/",
We can decide if be backward compatible and handle only prestart and poststop when the type is not specified. On the other hand there are not probably many users out there so we can just enforce it and not worry in the future of supporting this additional case.
Anyway the current development version support the case where the hook is not specified. The issue reported here depend on the hook version that is too old. The issue is fixed upstream with:
commit 69858facb2597d4bc0df3ed405876d6c167b943f Author: Daniel J Walsh dwalsh@redhat.com Date: Fri Jun 23 19:31:05 2017 +0000
Needs this change to work directly with runc and cri-o
The path bundle path passed to a container is called bundle, not bundlepath.
Also root is not in the state file, but can be retrieved from the config.json.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
Also this other commit is required, otherwise oci-systemd-hook will just segfault:
commit 40ab578cc3550b25c52e64e43d82a65d23e3365e Author: Jason Wessel jason.wessel@windriver.com Date: Wed Jul 12 09:21:44 2017 -0700
Allow container definitions where rootfs is not an absolute path
In earlier versions of the runc frame work the rootfs path was passed
as a key with the initial json that was passed on the stdin and it was
automatically computed to be an absolute path.
This translation to an absolute path must be done in the hook based on
the bundlePath. This allows the config.json to be relocated by the
container hosting system storage without modifying the config.json.
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
So we need to get this updated in RHEL7 package. @lsm5 Can you create a new oci-systemd-hook package for RHEL?
rebased to 1e84754 in rhel 7.4 branch
Would it be possible to also get it to the rhel atomic host 7.3?
There is only on stream of extras, so if you install oci-umount after it gets shipped, it wil be placed into RHEL7.3
Some of the discussion here is about how the hook decides which stage its running in. The most portable approach to that is to use status
in stdin's state JSON (kubernetes-incubator/cri-o#1360). I have an open PR for that against the very-similar oci-umount
: projectatomic/oci-umount#35. If/when that lands I'll cherry-pick it over here.
something maybe useful: https://github.com/intel/cc-oci-runtime/issues/270
runc is passing root rather than bundlePath to the hooks. oci-systemd-hook is compounding the problem by requiring the erroneous root value.
Hello,
oci-systemd-hook on latest RHEL atomic host doesn't work with runc.
This is the error I'm getting:
Config.json template is here: https://github.com/pschiffe/gce-system-container/blob/master/image/config.json.template
Can you help? How to debug?