Closed dreamcat4 closed 4 years ago
BTW:
To stop these container manually, and re-run pipework afterward (and restart each affected client containers). Can temporarily work around the problem. You have to do it manually though.
I have my suspicions:
Perhaps this happen when multiple containers are started all together. If the previous busybox container had not enough time to complete / exit before the next one. Then somehow the problem is perpetuated.
Since I start my containers in a loop, it immediately run the next pipework command. Does the pipework script wait until the busybox container has exit to also exit itself? I don't think so. But that may be a desirable behaviour (for me).
@jpetazzo This problem hasn't recurred for me recently. So I'm not sure how widespread it actually is. But anyway here is are 3 suggestions how we might attempt to avoid such problem occuring:
1) Run in the foreground the busybox container. So that the busybox udhcpc
command must itself exit before the next pipework command can be run.
2) Else apply a unique label identifier on it. For checking on successive invocations. For example here:
We may add an extra argument to docker run. Like
--label="jpetazzo/pipework". Or some other appropriate identifier. So that pipework can later query it using
docker ps --filter="label=jpetazzo/pipework"`. To clean up any matching hung containers before executing the next busybox dhcp instance. (successive invocations).
However if the pipework command is being run successively, then perhaps the previous running container may not be a hung container. Just that the previous container has not finished / exited yet. In which case it is not clear if pipework should be waiting for the container, or instead killing it.
Using the same trick of docker ps --filter=label
, we may also remove spent / exited images which were previously run. To stop them from accumulating too much.
Note: A drawback is that this approach requires docker ps --filter
. Which is only available on the most recent versions of docker. Of course that is an issue will solve itself in due time.
3) An alternative approach (instead of labels) can be to set the image name to some grep'able string. Such as pipework-busybox-$CONTAINER_NAME
or pipework-busybox-$RANDOM_UUID
. Then the output of docker ps
may be grepped without such compatibility issue.
Sorry for the lag!
I think there are two issues.
1) Waiting for the DHCP client to do its job before continuing. It's possible, but hackish. A few ideas:
2) Tagging the DHCP "sidekicks" appropriately. Labels are cool. Probably a "pipework" label to indicate the container "belongs" to pipework, then "pipework.dhcp=ID" to indicate the ID of the other container.
(In theory we should use some reverse FQDN like com.pipework.etc but I'm not in the mood of specifying 42 miles long labels right now, nor purchasing a domain just for that :smirk:)
WDYT?
I honestly don't mind how to do it. Anything you feel would be an acceptable solution to me.
After reporting the issue, it has not recurred for myself. I suspect it only actually happens at certain times. For example if the DHCP server happens to be slow to respond to requests, or may be offline. But also at the same tiem when pipework is being multiple times in sequence. Like at system startup.
This happened again today (from a fresh reboot). 2nd time, same situation / conditions. All containers starting at once.
udhcpc
will work if it's the version from ubuntu 15.10. That is not the default dhcp provider for pipework so users will have to specify that option explicitly in their pipework cmds to get such work around. Only if they are affected. As I haven't yet heard other reports of this issue, just myself. Maybe worth some errata or troubleshooting FAQ to mention somewhere in docs.
Previously udhcpc
was not working for me... but many thanks has got solved by following tips in related issue https://github.com/jpetazzo/pipework/issues/47#issuecomment-144525962 credit to @stoopsj for that one.
OK hacked in a sleep 2
before launching the busybox container. Unfortunately it had no effect (same error).
+ [ phys = ipoib ]
+ ip link set ph21243eth0 netns 21243
+ ip netns exec 21243 ip link set ph21243eth0 name eth0
+ [ 0a:00:00:03:00:17 ]
+ ip netns exec 21243 ip link set dev eth0 address 0a:00:00:03:00:17
+ sleep 2
+ docker run -d --net container:smb.kodi --cap-add NET_ADMIN busybox udhcpc -i eth0 -x hostname:smb.kodi
+ installed arping
+ command -v arping
+ cut -d/ -f1
+ echo dhcp
+ IPADDR=dhcp
+ ip netns exec 21243 arping -c 1 -A -I eth0 dhcp
+ true
+ rm -f /var/run/netns/21243
Can't really see what's wrong with @jpetazzo code here though. It looks like it aught to do the right things.
id@emachines-e520:~/docker-images$ docker logs admiring_brown 2>&1 | head
udhcpc (v1.23.2) started
Sending discover...
Read error: Network is down, reopening socket
udhcpc: sendto: Network is down
Sending discover...
udhcpc: sendto: Network is down
Read error: Network is down, reopening socket
Sending discover...
udhcpc: sendto: Network is down
Read error: Network is down, reopening socket
id@emachines-e520:~/docker-images$
Ah. Now I see in busybox ifconfig
needs the -a
for show all flag. And indeed the network interface is present. Before it didn't show up in the cmd output. So that's not the issue after all...
id@emachines-e520:~/docker-images$ docker exec admiring_brown ifconfig -a
eth0 Link encap:Ethernet HWaddr 0A:00:00:03:00:17
BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:8 errors:0 dropped:0 overruns:0 frame:0
TX packets:8 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:480 (480.0 B) TX bytes:480 (480.0 B)
OK the busybox's eth0
is not in the up
state.
the busybox's eth0 is not in the up state
Doing ifconfig eth0 up
inside the busybox container causes some improvement. In the sense that it's no longer hand. The udhcpc
completes, and the container exits.
id@emachines-e520:~/dev$ docker start jackett.id
jackett.id
id@emachines-e520:~/dev$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
bcfef8293694 busybox "udhcpc -i eth0 -x ho" 1 seconds ago Up 1 seconds hungry_bhabha
0cdd44694800 dreamcat4/pipework "/entrypoint.sh --hel" 12 hours ago Up 12 hours pipework
61eebce2f692 dreamcat4/jackett "/init /entrypoint.sh" 8 weeks ago Up 4 seconds jackett.id
id@emachines-e520:~/dev$ ping -c1 jackett.id
PING jackett.id (192.168.5.6) 56(84) bytes of data.
From emachines-e520.lan (192.168.1.33) icmp_seq=1 Destination Host Unreachable
--- jackett.id ping statistics ---
1 packets transmitted, 0 received, +1 errors, 100% packet loss, time 0ms
id@emachines-e520:~/dev$ docker exec hungry_bhabha ifconfig eth0 up
id@emachines-e520:~/dev$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
0cdd44694800 dreamcat4/pipework "/entrypoint.sh --hel" 12 hours ago Up 12 hours pipework
61eebce2f692 dreamcat4/jackett "/init /entrypoint.sh" 8 weeks ago Up About a minute jackett.
... however for some reason the ping still failed afterwards:
id@emachines-e520:~/dev$ ping -c1 jackett.id
PING jackett.id (192.168.5.6) 56(84) bytes of data.
From emachines-e520.lan (192.168.1.33) icmp_seq=1 Destination Host Unreachable
--- jackett.id ping statistics ---
1 packets transmitted, 0 received, +1 errors, 100% packet loss, time 0ms
id@emachines-e520:~/dev$
I will try again.
Unfortunately although the busybox udhcpc
is reporting getting a lease, that success is not reflected in the linked container, or by ping probe either.
Not sure why that is, or why the container's pipework interface was in the DOWN
state to begin with. I have been trying certain other things but unfortunately no improvement. I'm going to leave the progress @ there for the time being. And just switch to the ubuntu 15.10 udhcpc
for myself...
But if any others have similar issue please report it.
I have the same issue. Also running Ubuntu 14.04, udhcpc and starting containers multiple at once. Recently updated pipework script from the repo, and faced that
UPD Managed to solve an issue. Here is a bit more info and workaround for those who are going to suffer from that.
LONG STORY: Today at morning I rebooted the server and faced this issue. Issue was arising consistently for me, even after reverting all dockerfiles, pipework, /etc/ to earlier revisions. Moreover, starting services one by one did not help, as well as waiting 20 secs before running pipework. I was a bit wrong in previous post saying that I'm using udhcpc. The first thing I did I upgraded udhcpc package to Ubuntu 15.04 version, but it did not help. Finally, I decided to try another dhcp client, and noticed that despite of busybox containers stuck with udhcpc, I'm starting pipework with
bash pipework int-br ${NAME} dhcp ${MAC}
That means that udhcpc program that is being executed in container is not the same as installed on host. Changed dhcp to udhcpc (so pipework runs the host version of udhcpc in container's network namespace), and the issue has gone.
SUSPECTED REASON: I've updated busibox container this morning. Official Docker Busybox image has been updated 2 days ago. https://github.com/docker-library/busybox/commits/master . Maybe they changed dhcp client program or it's version, so it can not understand options passed by pipework.
PID USER TIME COMMAND
1 root 0:00 udhcpc -i eth1 -x hostname squid-proxy
13 root 0:00 /bin/sh
20 root 0:00 ps aux
WORKAROUND: Run pipework with option udhcpc, and not with dhcp:
bash pipework int-br ${NAME} udhcpc ${MAC}
Of course, udhcpc should be installed on physical host.
UPD2: Looks like with the workaround I posted, I could not access docker container from other machines attached to bridge. That is because I did not disable default Docker network eth0, and it was serving as a default route. Had to add --net=none
to all docker run commands in my scripts
@Dmitriusan +1 I am in agreement. Using a newest version of ubuntu udhcpc
on host machine (of 15.04
of newer) feels like the easiest workaround for the time being. And declaring udhcpc
in your pipework commands. Probably only needed for those users who are experiencing ^^ this problems.
A better long term solution be to make the busybox project aware of the problem, in the hopes that they are in a position to fix it. However equally they might not be very interested in docker and such related matters with the dhcp client. I'm not sure what they might feel inclined to do about it. Haven't asked.
Or for other alternative (small docker images) instead of using busybox
docker image. Well I did look around but unfortunately could not find anything as a suitable replacement. Just for the task to run a simple dhcp client. Maybe I missed / overlooked. It really surprised me not to find something else.
Just to make sure I understand correctly: with the latest busybox, does the problem happen always now, or only when starting a bunch of containers at the same time? (Which would hint at some race condition)
To reproduce it seems to initially require the 2nd situation - to be starting multiple containers at the same time. Once the problem starts to occur, it can continue happening thereafter when starting individual containers.
To clear problem I reboot the whole computer (or perhaps can be cleared with less than a full system reboot, don't know).
I don't use busybox default method anymore now. Or the other ones. Only Ubuntu 15.04+ version of udhcpc (recent / newest). That is the only one of them that works for me without issues.
Looks like udhcpc
tries to run a client script each time a relevant event occurs, and does not do any interface-related changes by itself, see dhcpc.c and the manpage section for udhcpc. The default script, /usr/share/udhcpc/default.script
is not present in the helper container.
Thanks @pppq!
What does this mean then? Perhaps we could mount the missing script into the busybox image with -v host/script:/path/script
?
Or is this more like some general bug in the official busybox image, whereby we should always really have the missing file built right into it? (i.e. to benefit also the many other users of Busybox image).
Yes, if I exec into the helper, I'm seeing:
/ # ls -al /usr
total 24
drwxr-xr-x 3 root root 4096 Jan 2 16:51 .
drwxr-xr-x 1 root root 4096 Jan 2 22:05 ..
drwxr-xr-x 2 daemon daemon 4096 Dec 8 16:44 sbin
Ubuntu keeps the example scripts in /usr/share/doc/busybox-static/examples/udhcp
(see the filelist), and in Jérôme's rootfs.tar, it's present in the expected location. So the image building process needs to copy it from the appropriate location.
Well, it would be the easiest if the official image included the script, but it will not be of great use unless the container is running in privileged mode, or one adds the NET_ADMIN
capability, I think. So I'm not sure if it is useful for the wider audience of the image. On the other hand, the script is not too big either. :smile:
Ping @jpetazzo ^^
I see! Let me summon the Powers Than Be.
@tianon: the busybox image contains the udhcpc
client, but this client depends on a couple of scripts to work correctly. The scripts are invoked by udhcpc
once it has obtained a lease, and the scripts are responsible for configuring the network interface. The scripts are currently not included in the busybox image. Do you think we should include them, or should we just tell people to build their own busybox image if they need to? (Which is not too hard since that'd just be FROM busybox
and a COPY ./them-scripts/ /to/dat/path/tho/
)
The experimentally inclined can also include it in the image builder. :smile: https://github.com/pppq/docker-busybox/commit/5c200e53e6ead5c2a5ecc7a0895faa2257ad4938
But I agree, it is easier to enable and/or customize this functionality with an additional layer. Also, a regular container instance not started via pipework
will not have the required level of access to its veth
interface to change the IP address, netmask and default gateway to the received values.
@jpetazzo ah interesting -- there is an "example" configuration referenced as part of the BusyBox source, so that would be really trivial to include (https://git.busybox.net/busybox/tree/examples/udhcp?h=1_24_stable)
My question here would be whether there's a recommendation from BusyBox upstream one way or the other on what the default should be for a generic environment like the one we provide? Do they have any documentation about this script/applet and the recommended usage? (I've done some searching and can't seem to find any. :disappointed:)
Looking at https://git.busybox.net/busybox/log/examples/udhcp?h=1_24_stable is not terribly encouraging (those example scripts haven't been touched since 2014, which likely either means they're unmaintained, or that they're rock-solid).
The only hint I could find is in Config.src. The scripts also don't try to do too much – it looks like simple.script
is a one-file combination of all the sample.*
scripts that handle dhcpc
events individually.
With that said, there are people who come up with alternative implementations, see http://lists.busybox.net/pipermail/busybox/2007-January/059859.html for an example. I don't know if there is a definitive script that should be placed in the default location.
Hi again. It seems the new
busybox udhcpc
method is improved greatly over previousdhclient
. However there is some issue I encountered today: Sometimes it has an issue and the busybox container don't exit.This in turn somehow causes an issue of no WAN connectivity inside the container (but ok LAN connectivity). I have an example to show here:
And the
docker logs
of 1 such container is here (at bottom of page):https://gist.github.com/dreamcat4/d0655834dc358191a979