Closed cheuschober closed 6 years ago
You are totally wrong:
(The seeding is meant to be used from the lxc runner) ...
@kiorky, Ok. I get that -- what about the steps to reproduce? If the error is misidentified, that makes sense -- still can't init a container.
I think you may also have been misleaded from false positive debug output.
For exemple, the one you are giving is coming from here :
https://github.com/saltstack/salt/blob/develop/salt/modules/lxc.py#L2853
This tests if the container is attacheable yet but is not a failure, even if it outputs something to your logs.
@kiorky Still failed...
lxc.sls
lxc.container_profile:
ubuntu:
template: ubuntu
options:
release: trusty
arch: amd64
The runner in the above scenario can't even fin the profile in pillar. I noticed, however, the documentation diverges in how to name the profiles between the example at:
http://docs.saltstack.com/en/latest/topics/tutorials/lxc.html#creating-a-container-on-the-cli
and
http://docs.saltstack.com/en/latest/topics/tutorials/lxc.html#tutorial-lxc-profiles-container
Minnion error log as below:
2015-05-15 17:57:59,315 [salt.loaded.int.module.cmdmod][ERROR ][28404] Command 'lxc-attach --clear-env -n testbuntu-01 -- /usr/bin/env' failed with return code: 1
2015-05-15 17:57:59,316 [salt.loaded.int.module.cmdmod][ERROR ][28404] output: lxc-attach: attach.c: lxc_attach: 635 failed to get the init pid
2015-05-15 17:58:00,148 [salt.loaded.int.module.cmdmod][ERROR ][28404] Command 'lxc-attach --clear-env --set-var PATH=/bin:/usr/bin:/sbin:/usr/sbin:/opt/bin:/usr/local/bin:/usr/local/sbin -n testbuntu-01 -- command -v salt-minion' failed with return code: 255
2015-05-15 17:58:00,149 [salt.loaded.int.module.cmdmod][ERROR ][28404] stderr: lxc_container: attach.c: lxc_attach_run_command: 1094 No such file or directory - failed to exec 'command'
2015-05-15 17:58:00,149 [salt.loaded.int.module.cmdmod][ERROR ][28404] retcode: 255
2015-05-15 17:58:00,170 [salt.loaded.int.module.cmdmod][ERROR ][28404] Command "lxc-attach --clear-env --set-var PATH=/bin:/usr/bin:/sbin:/usr/sbin:/opt/bin:/usr/local/bin:/usr/local/sbin -n testbuntu-01 -- test -e '/lxc.initial_seed'" failed with return code: 1
2015-05-15 17:58:00,170 [salt.loaded.int.module.cmdmod][ERROR ][28404] retcode: 1
@kiorky If the attachable()
issue is easily ignored, then should I assume the command -v
call is really the problem here?
I did a manual attach and ran that command and it executed flawlessly but it won't run as above.
just paste somewhere a full execution log if you are not able to interpret it.
I doubt there is a bug, i use it daily (salt/develop branch) but we also made sure with @terminalmage to have something stabilized in stable/2015.*.
run with -lall (loglevel: all)
i personally call the exec mod lxc.init from the runner lxc.cloud_init function. let me check (that's no how i use it personally, i have another internal interface to setup settings, that i feed the function with).
removed comment: useless
Forget what i said, i'm a bit tired :).
You can use both, the the correct scheme is lxc.container_profile
.
Yep, lxc.container_profile
is checked first, and if not found then lxc.profile
is checked.
@kiorky, Thank you for the documentation update. The profile has remained unchanged.
I called it as the following (from an Ubuntu 14.04 instance):
$ sudo salt 'salt' lxc.get_container_profile ubuntu
salt:
----------
options:
----------
arch:
amd64
release:
trusty
template:
ubuntu
Running:
$ sudo salt-run lxc.init -lall --out yaml --out-file /vagrant/run-log.txt testbuntu-01 host=salt profile=ubuntu
Output
event:
message:
comment: ''
done: []
errors:
salt:
testbuntu-01:
- changes:
init:
- create: Container created
- config: Container configuration updated
- state:
new: running
old: null
comment: Bootstrap failed, see minion log for more information
name: testbuntu-01
result: false
ping_status: false
result: false
salt: []
suffix: progress
comment: ''
done: []
errors:
salt:
testbuntu-01:
- changes:
init:
- create: Container created
- config: Container configuration updated
- state:
new: running
old: null
comment: Bootstrap failed, see minion log for more information
name: testbuntu-01
result: false
ping_status: false
result: false
salt: []
Relevant minion log:
2015-05-15 19:00:41,351 [salt.loaded.int.module.cmdmod][ERROR ][796] Command 'lxc-attach --clear-env -n testbuntu-01 -- /usr/bin/env' failed with return code: 1
2015-05-15 19:00:41,352 [salt.loaded.int.module.cmdmod][ERROR ][796] output: lxc-attach: attach.c: lxc_attach: 635 failed to get the init pid
2015-05-15 19:00:42,174 [salt.loaded.int.module.cmdmod][ERROR ][796] Command 'lxc-attach --clear-env --set-var PATH=/bin:/usr/bin:/sbin:/usr/sbin:/opt/bin:/usr/local/bin:/usr/local/sbin -n testbuntu-01 -- command -v salt-minion' failed with return code: 255
2015-05-15 19:00:42,174 [salt.loaded.int.module.cmdmod][ERROR ][796] stderr: lxc_container: attach.c: lxc_attach_run_command: 1094 No such file or directory - failed to exec 'command'
2015-05-15 19:00:42,175 [salt.loaded.int.module.cmdmod][ERROR ][796] retcode: 255
2015-05-15 19:00:42,198 [salt.loaded.int.module.cmdmod][ERROR ][796] Command "lxc-attach --clear-env --set-var PATH=/bin:/usr/bin:/sbin:/usr/sbin:/opt/bin:/usr/local/bin:/usr/local/sbin -n testbuntu-01 -- test -e '/lxc.initial_seed'" failed with return code: 1
2015-05-15 19:00:42,198 [salt.loaded.int.module.cmdmod][ERROR ][796] retcode: 1
For the pillar data, keep in mind that it is only compiled in certain cases (minion start, when state.highstate or saltutil.refresh_pillar is called, maybe a couple others I am missing). You must refresh manually with saltutil.refresh_pillar if you make changes to your pillar if the next thing you do is an lxc.init which needs that pillar data.
Regarding your error, lxc-attach: attach.c: lxc_attach: 635 failed to get the init pid
is the error you would get if you tried to attach to a container that is not running. However, the runner says that the container is running. Perhaps it is not yet running when lxc.init moves on to perform the bootstrap, though I can't see how that would happen since I believe the lxc.start function (which is used to start up the container) blocks until the container is up. Try adding a bootstrap_delay=5
to your lxc.init command and see if that changes anything.
@terminalmage
Ok, so I just found this little note in the documentation:
Warning
Many shell builtins do not work, failing with stderr similar to the following:
lxc_container: No such file or directory - failed to exec 'command'
The same error will be displayed in stderr if the command being run does not exist. If the retcode is nonzero and not what was expected, try using lxc.run_stderr or lxc.run_all.
http://docs.saltstack.com/en/latest/ref/modules/all/salt.modules.lxc.html#salt.modules.lxc.retcode
That is, however, what appears to be called at:
https://github.com/saltstack/salt/blob/develop/salt/modules/lxc.py#L2644
Bug?
@terminalmage, I tried it with the delay but had the same result.
@terminalmage, I still think the first retcode 1 was from:
https://github.com/saltstack/salt/blob/develop/salt/modules/lxc.py#L1230-L1259
I did a little log peppering before I first reported this and when run in the above scenario, it hit the run statement at L1231 before it hit the start at L1259. Since run()
(and _run()
) call attachable()
, the first error appears to be related to that. @kiorky has corrected me to note that it does have fallback to chroot so it appears to be relatively benign (though it does complicate debugging a bit to have any spurious ERROR messages like that). Why the rest of it fails is a bit confusing.
@cheuschober It does look like command -v
does not work in lxc-attach, I've replaced it with which
.
# salt saltmine lxc.run ubuntu1404 'command -v salt-minion'
saltmine:
lxc-attach: attach.c: lxc_attach_run_command: 1107 No such file or directory - failed to exec 'command'
# salt saltmine lxc.run ubuntu1404 'which salt-minion'
saltmine:
/usr/bin/salt-minion
Pull request for that change is https://github.com/saltstack/salt/pull/23782
@terminalmage, Thanks Erik.
Still can't build a container successfully, though:
2015-05-15 19:46:04,493 [salt.loaded.int.module.cmdmod][ERROR ][4312] Command 'lxc-attach --clear-env -n testbuntu-01 -- /usr/bin/env' failed with return code: 1
2015-05-15 19:46:04,494 [salt.loaded.int.module.cmdmod][ERROR ][4312] output: lxc-attach: attach.c: lxc_attach: 635 failed to get the init pid
2015-05-15 19:46:09,820 [salt.loaded.int.module.cmdmod][ERROR ][4312] Command 'lxc-attach --clear-env --set-var PATH=/bin:/usr/bin:/sbin:/usr/sbin:/opt/bin:/usr/local/bin:/usr/local/sbin -n testbuntu-01 -- which salt-minion' failed with return code: 1
2015-05-15 19:46:09,821 [salt.loaded.int.module.cmdmod][ERROR ][4312] retcode: 1
2015-05-15 19:46:09,842 [salt.loaded.int.module.cmdmod][ERROR ][4312] Command "lxc-attach --clear-env --set-var PATH=/bin:/usr/bin:/sbin:/usr/sbin:/opt/bin:/usr/local/bin:/usr/local/sbin -n testbuntu-01 -- test -e '/lxc.initial_seed'" failed with return code: 1
2015-05-15 19:46:09,842 [salt.loaded.int.module.cmdmod][ERROR ][4312] retcode: 1
I can confirm the container is running and accepts the attach commands directly as above.
@cheuschober I added a commit to that pull request which will get rid of the spurious "failed to get the init pid" message. I'll keep looking here.
@terminalmage Thank you so much!
@cheuschober It looks like all the log messages in you posted above are spurious (in other words, a nonzero return code does not necessarily mean that there is a problem), we can use ignore_retcode=True
to suppress these and keep them from being seen as errors.
@terminalmage Thanks. I'm just realizing that now. There's a lot more interesting output to be had with salt-call lxc.init
-- I want to look into something I'm seeing that could indicate a networking problem. I didn't realize this called out to the saltstack website to bootstrap, I (incorrectly) assumed the bootstrapper was shipped from the master to the cloud target.
Still it's good to get those spurious messages out of the way since they clearly took me down a path to chase the wrong goose.
Yeah, really sorry about that. I thought we caught all of these, but some of them slipped through. Thanks for helping us to track down the remaining ones.
No worries! It's how software gets better! :)
I totally agree! :smile:
@cheuschober OK, I've gotten rid of the remaining spurious log messages you posted.
lxc.init does rely on networking because it executes salt-bootstrap
@terminalmage: Awesome. So here's the networking thing: I would have expected the above profile to just work out of the box since Ubuntu does create an lxcbr by default. Interestingly, lxc.create ubutest-01 profile=ubuntu
does as expected but lxc.init ubutest-02 profile-ubuntu
appears to blow out the networking profile. Similarly, if I init
a pre-created container it also blows out the default networking profile.
Is that intentional?
Here are some sample configuration differences between the two:
Created via salt-call lxc.init testbuntu-01 profile=ubuntu
:
# Template used to create this container: /usr/share/lxc/templates/lxc-ubuntu
# Parameters passed to the template: --release trusty --arch amd64
# For additional config options, please look at lxc.container.conf(5)
# Common configuration
lxc.include = /usr/share/lxc/config/ubuntu.common.conf
# Container specific configuration
lxc.rootfs = /var/lib/lxc/testbuntu-01/rootfs
lxc.mount = /var/lib/lxc/testbuntu-01/fstab
lxc.utsname = testbuntu-01
lxc.arch = amd64
# Network configuration
lxc.start.auto = 1
Created via salt-call lxc.create testbuntu-02 profile=ubuntu
# Template used to create this container: /usr/share/lxc/templates/lxc-ubuntu
# Parameters passed to the template: --release trusty --arch amd64
# For additional config options, please look at lxc.container.conf(5)
# Common configuration
lxc.include = /usr/share/lxc/config/ubuntu.common.conf
# Container specific configuration
lxc.rootfs = /var/lib/lxc/testbuntu-02/rootfs
lxc.mount = /var/lib/lxc/testbuntu-02/fstab
lxc.utsname = testbuntu-02
lxc.arch = amd64
# Network configuration
lxc.network.type = veth
lxc.network.flags = up
lxc.network.link = lxcbr0
lxc.network.hwaddr = 00:16:3e:2c:88:c7
what do you mean by blowing the profile ? in the lxc config ?
On the output you get from logs ?
For the networking, be sure to configure something to use 'lxcbr0' which is the default bridge on ubuntu, take also a look on your template /etc/network/interfaces to check weither it uses dhcp or not, and verify that there is a dhcp server serving your bridge, normally on ubuntu, the lxcnet service gives you a dnsmasq configured for that purpose.
Seems our comments crossed, you ll need to attach a network profile, did you configured one ? See https://github.com/terminalmage/salt/blob/0f6f239052074eed083202acf9db77d72a77c6f2/salt/modules/lxc.py#L446
We had a discussion with @terminalmage about the network profile on weither to configure it by default, but it is impossible to do as it is distribution and user specific: dhcp ? which bridge ? which gw ? and etc.
I just did realized that you are using the runner from one of the comments above (i didnt backlog at first the whole discussion).
This should be enougth
lxc.network_profile.ubuntu:
eth0:
link: lxcbr0
type: veth
flags: up
then use
salt-run lxc.init -lall testbuntu-xx host=salt profile=ubuntu network_profile=ubuntu
@kiorky, That worked. Thank you so much for your direction and time. (I had just tested that independently). I guess what threw me was that lxc.create
does not have the same behavior in regards to network profiles as lxc.init
. Would it be reasonable to have an explicit warning of that fact in the documentation (that network_profile is required for init
and not for create
)?
Well, that is the hard question to answer, lxc-create is a different beast and really really more raw than lxc.init, so we can make some shortcuts.
lxc.init is more the assembly for tools to assemble their container based on configuration options and yes a bit tougth the first time you use it.
It is versatile, and usable from many options, eg, i do not use it like you :).
I think we should at least complete the documentation to state correctly that lxc.init should really be used from the runner, and indicate clearly that a profile, and a network profile must be configured as prerequisites.
Indeed. Thank you again.
Network profiles are not required for lxc.create either, it's just that if one isn't provided, Salt uses a fallback profile. The reasoning for this is that many LXC images come with networking set to "empty".
@terminalmage Yup. It was just the difference in how lxc.create
would use that fallback and lxc.init
would not that threw me. This was all a long road of trying to debug some problems with an lxc-cloud issue for me and as I decomposed the problem into what I thought of as its component parts I realized there were some fairly magical things happening in init
If init is not using the fallback, we should make it do so. What do you think, @kiorky?
Im not sure at all that it can be done easily without breaking compatiblity and behavior.
init is made to work with explicit settings and overwrite the lxcconfig with it's own set of lxc settings as said in an other comment.
on the contrary, lxc.create only applies a network profile on top of what's created by the stock lxc utils.
Thus, what we could do is to try to find a bridge to attach to (search lxcbr, then br) and setup that if no profile was selected, instead of assuming no network.
@kiorky, So I was still having some trouble with my cloud init and came back to the networking issue.
When I last stated it as working I must have called
salt-call lxc.init testbuntu-01 profile=ubuntu network_profile=ubuntu
This worked, as hoped. But I just tried the runner version and found that network_profile appeared to be ignored.
salt-run lxc.init testbuntu-01 host=salt profile=ubuntu network_profile=ubuntu
In the above case, the network profile was not applied. I tried the network profile in both the format you suggested as well as the hierarchical form below:
lxc.network_profile:
ubuntu:
eth0:
link: lxcbr0
type: veth
flags: up
lxc.container_profile:
ubuntu:
template: ubuntu
options:
release: trusty
arch: amd64
When you have an error, please attach the FULL log from the start of your experiences as without, we cant help you much.
@terminalmage the patchs you done are on 2015.5, are on a portion which conflicts with develop as it was migrated to container_resources, im checking if we also need to make the same patchs in container_resources.
@cheuschober im currently testing your use case, from end to end, to see what's going on.
Happy to be proven wrong here, but from my best understanding, calls to lxc.init will always fail as long as it attempts to remove the seed marker prior to starting the container.
remove_seed_marker
is set toTrue
at: https://github.com/saltstack/salt/blob/develop/salt/modules/lxc.py#L1197The bit that looks like it could use some reordering is: https://github.com/saltstack/salt/blob/develop/salt/modules/lxc.py#L1230-L1259
Steps to reproduce:
First failure is in
attachable()
Versions report: