jpetazzo / pipework

Software-Defined Networking tools for LXC (LinuX Containers)
Apache License 2.0
4.22k stars 727 forks source link

/sbin/pipework: 274: kill: Illegal number: #150

Closed dreamcat4 closed 9 years ago

dreamcat4 commented 9 years ago

Not sure what is happening here...

First time pipework is called it returns an exit code 2. After waiting 5 seconds, the same pipework cmd is retry, and then it works.

https://gist.github.com/dreamcat4/34975af1714b6f916c3d

+ ip netns exec 29121 dhclient -pf /var/run/dhclient.29121.pid eth0
mv: cannot move '/etc/resolv.conf.dhclient-new.29301' to '/etc/resolv.conf': Device or resource busy
+ cat /var/run/dhclient.29121.pid
cat: /var/run/dhclient.29121.pid: No such file or directory
+ kill 
/sbin/pipework: 274: kill: Illegal number: 
++ '[' 2 '!=' 0 ']'

In shell, the commands were:

id@emachines-e520:~/docker-images$ docker start pipework ; sleep 15 ; docker start tvh
pipework
tvh
dreamcat4 commented 9 years ago

Hmm. [EDIT] still getting illegal number error msg on the first invokation.

jpetazzo commented 9 years ago

Do you think you could try with a different DHCP client?

I wonder if this might be caused by the fact that dhclient tries to clobber /etc/resolv.conf, which is actually a bind-mount (and has to be re-written instead of re-created).

dreamcat4 commented 9 years ago

@jpetazzo Yeah sure! Although it was a long time ago I tried the others once. To my memory only 1 clients actually worked for me. Can't remember which one(s).

I'll try to re test it and then get back to you some better clues. Maybe I should also edit your script to strace the relevant command to see what it's doing on those runs vs the 'working times' (when the same cmd is run but there is not error).

This is good !

dreamcat4 commented 9 years ago

@jpetazzo. Have now had the chance to get back to try this. Updated to latest pipework today with your new default busybox udhcpc client. From my limited testing (during 1 hour), it seems fantastic. Very happy. My docker image is updated to v1.10 tag.

It seems to solve this issue entirely. I can't fault your analysis of the problem (trying to move the resolv.conf. Since there were other worse issues / hang with dhclient, there was little point going back to confirm it. As we can't do anything about it anyway.

jpetazzo commented 9 years ago

Thanks! Sorry that I couldn't check earlier. I have slated some time next week to go over open pipework issues, I hope I can crunch through the backlog :-)

dreamcat4 commented 9 years ago

I wonder if this might be caused by the fact that dhclient tries to clobber /etc/resolv.conf, which is actually a bind-mount (and has to be re-written instead of re-created).

It seems there will be some change in behaviour for docker v1.9.1

https://github.com/docker/docker/pull/14965

^^ Not sure if that helps dhclient does it? But at least dhclient would then be able to see the file is readonly... and maybe not be trying to move it / / update it / replace it.