Open emfrias opened 3 years ago
@emfrias
I've faced with the same bug today. Thank you for the report, it gave me a short way to fix the problem.
The initial issue with weave is that it is still using the old version of miekg/dns
- 1.0.4 in vendors. I updated to the new version v1.0.5 of miekg/dns
(you can edit it in go.mod
file) that already has that bug fixed and build new weave images. And then use the newly built images on hosts. Hope that will help you.
Thanks @Cybernisk, that helped. I just built new images as you described and they've been working well so far. I'm new to this build system so I did it wrong a few times before getting a version that actually had the new version of the dns module. I wound up with:
git clone https://github.com/weaveworks/weave.git
cd weave
go get github.com/miekg/dns@v1.0.5
go mod vendor
make
Is it likely that this simple dependency update will find its way into the next release of the binary?
Unfortunately the described workaround does not work here as expected:
Step 9/15 : RUN go get github.com/weaveworks/build-tools/cover github.com/mattn/goveralls golang.org/x/lint/golint github.com/fzipp/gocyclo github.com/fatih/hclfmt github.com/client9/misspell/cmd/misspell
---> Running in 8700989776c1
cannot find package "github.com/hashicorp/hcl/hcl/printer" in any of:
/usr/local/go/src/github.com/hashicorp/hcl/hcl/printer (from $GOROOT)
/go/src/github.com/hashicorp/hcl/hcl/printer (from $GOPATH)
The command '/bin/sh -c go get github.com/weaveworks/build-tools/cover github.com/mattn/goveralls golang.org/x/lint/golint github.com/fzipp/gocyclo github.com/fatih/hclfmt github.com/client9/misspell/cmd/misspell' returned a non-zero code: 1
make: *** [Makefile:255: .build.uptodate] Error 1
Building succeeded with the following patch:
diff --git a/build/Dockerfile b/build/Dockerfile
index ae6a677..e47913e 100644
--- a/build/Dockerfile
+++ b/build/Dockerfile
@@ -49,6 +49,9 @@ RUN curl -fsSLo shfmt https://github.com/mvdan/sh/releases/download/v1.3.0/shfmt
mv shfmt /usr/bin
# Install common Go tools
+RUN GO111MODULE=on go get github.com/hashicorp/hcl@v1.0.0; \
+ mkdir -p /go/src/github.com/hashicorp; \
+ ln -s $PWD/pkg/mod/github.com/hashicorp/hcl@v1.0.0 $PWD/src/github.com/hashicorp/hcl
RUN go get \
github.com/weaveworks/build-tools/cover \
github.com/mattn/goveralls \
Then the Ubuntu 20.04 golang version is not up-to-date, and another issue with modules not being in sync will be displayed. My workaround to be able to run make
completely was to update go to latest:
apt remove golang --purge --autoremove
curl -LO https://get.golang.org/$(uname)/go_installer && chmod +x go_installer && ./go_installer && rm go_installer
Unfortunately this also didn't produce a weave
executable that would not crash upon weave status
.
It's a pity to see this break on a very common platform.
You're right. I can't explain why, but my steps no longer work, but the changes @almereyda mentions get it building again for me.
From a clean ubuntu:20.04 machine:
sudo apt -y install build-essential git docker.io
curl -LO https://get.golang.org/$(uname)/go_installer && chmod +x go_installer && ./go_installer && rm go_installer
. ~/.bash_profile
git clone https://github.com/weaveworks/weave.git
cd weave
# patch build/Dockerfile using almereyda's patch above
go get github.com/miekg/dns@v1.0.5
go mod vendor
make
I didn't mention these steps earlier because I figured they'd be a bit different depending on your setup.
We've just built images for weaveworks/weave:latest
and its helpers on the local system. I run weave
using the script you get from sudo curl -L git.io/weave -o /usr/local/bin/weave
. That script will
try to run weaveworks/weave:2.8.1
by default, and since we didn't build that, it will download it from
docker hub and ignore the custom version we built. The simplest change is to edit the weave script:
--- weave.orig 2021-02-11 17:40:59.835349520 +0000
+++ /usr/local/bin/weave 2021-02-11 17:43:42.022209305 +0000
@@ -3,7 +3,7 @@
[ -n "$WEAVE_DEBUG" ] && set -x
-SCRIPT_VERSION="2.8.1"
+SCRIPT_VERSION="unreleased"
IMAGE_VERSION=latest
[ "$SCRIPT_VERSION" = "unreleased" ] || IMAGE_VERSION=$SCRIPT_VERSION
IMAGE_VERSION=${WEAVE_VERSION:-$IMAGE_VERSION}
and it will stick to using the latest
tag we built.
This should give you a version that works on this one machine.
I went a step further and pushed the new images to my private docker registry
MY_DOCKER_REGISTRY=docker-registry.me.com
for image in weave weaveexec weave-kube weave-npc weavedb network-tester; do
sudo docker tag weaveworks/$image:latest $MY_DOCKER_REGISTRY/weaveworks/$image:latest
sudo docker push $MY_DOCKER_REGISTRY/weaveworks/$image:latest
done
and then make one more edit to /usr/local/bin/weave
:
--- weave.new 2021-02-11 18:08:12.349300226 +0000
+++ /usr/local/bin/weave 2021-02-11 18:09:46.034718622 +0000
@@ -12,7 +12,7 @@
MIN_DOCKER_VERSION=1.10.0
# These are needed for remote execs, hence we introduce them here
-DOCKERHUB_USER=${DOCKERHUB_USER:-weaveworks}
+DOCKERHUB_USER=${DOCKERHUB_USER:-docker-registry.me.com/weaveworks}
BASE_EXEC_IMAGE=$DOCKERHUB_USER/weaveexec
EXEC_IMAGE=$BASE_EXEC_IMAGE:$IMAGE_VERSION
WEAVEDB_IMAGE=$DOCKERHUB_USER/weavedb:latest
Now I can distribute this patched version of /usr/local/bin/weave
to all my servers and they'll get the patched version of weave. If you don't have a private registry set up, you could manually load your patched binaries on each of your other systems (I guess using something like docker load < weave.tar.gz
), and also copy over the patched /usr/local/bin/weave
.
It looks like you could just set environment variables rather than patching the weave binary if that's easier for you.
This product is becoming increasingly difficult to justify when it doesn't run on Ubuntu 20.04. This issue has been open for six months - does Weaveworks actively monitor this forum?
Hi, I'm the Weaveworks CEO. We do keep an eye on these forums. At present we work on Weave Net for paying customers or as part of other commercial work.
Any progress on this issue? It's really disappointing situation. Almost a year passed since issue was opened.
At present we work on Weave Net for paying customers or as part of other commercial work.
Why would someone pay for something that is broken?
Looks like, the actual bug is in miekg/dns vendor code, not too experienced in Go, but looks like it checks a string length of 8 characters, and then tries to cut it to 9.
Ubuntu default resolv.conf now includes the string "trust-ad", which is 8 characters, and (i guess) line 94 breaks on this: https://github.com/weaveworks/weave/blob/master/vendor/github.com/miekg/dns/clientconfig.go
The original vendor code seems to be fixed, i think to solve this problem, it would be enough to upgrade: https://github.com/miekg/dns/blob/master/clientconfig.go
Checked my hosts, weave works on my hosts not having any 8-char long entry in the options in the resolv.conf, but breaks on hosts which do. Weave can be launched with the --no-dns option, and i can get a working "weave status", but that way it wouldn't really be usable.
Any idea for a workaround without messing up the automatic resolv.conf?
Figured a workaround which just needs editing the script, by removing the options from resolv.conf
weave already ignores, and mounting that file.
+++ b/weave
@@ -136,6 +136,7 @@ exec_remote() {
$(docker_run_options) \
--pid host \
$(exec_options "$@") \
+ -v /usr/local/bin/weave:/home/weave/weave \
-e DOCKERHUB_USER="$DOCKERHUB_USER" \
-e WEAVE_VERSION \
-e WEAVE_DEBUG \
@@ -1167,14 +1168,7 @@ launch() {
# Figure out the location of the actual resolv.conf file because
# we want to bind mount its directory into the container.
- if [ -L ${HOST_ROOT:-/}/etc/resolv.conf ]; then # symlink
- # This assumes a host with readlink in FHS directories...
- # Ideally, this would resolve the symlink manually, without
- # using host commands.
- RESOLV_CONF=$(chroot ${HOST_ROOT:-/} readlink -f /etc/resolv.conf)
- else
- RESOLV_CONF=/etc/resolv.conf
- fi
+ RESOLV_CONF=/etc/resolv.weave.conf
RESOLV_CONF_DIR=$(dirname "$RESOLV_CONF")
RESOLV_CONF_BASE=$(basename "$RESOLV_CONF")
It uses the file resolv.weave.conf
, in my case i just edited the original resolv.conf
with sed
to remove the trust-ad
option, generated at boot time with systemd.
Encountered this issue on ubuntu 22.04 because of options edns0 trust-ad
in my /etc/resolv.conf
.
What's weird is that on ubuntu 20.04 it was options edns0
and weave was working well.
We're facing same issue with weave on Ubuntu Server 20.04. Any workaround without modifying resolv.conf?
Switch to Calico? You get all the same features and more.
@withinboredom We need this for a nomad cluster. We're currently using weave for our job. We didn't find any supportive docs related to Calico with nomad cluster.
Hi, are there any updates to this?
What you expected to happen?
Weave shouldn't crash when a container tries to resolve a hostname.
What happened?
Containers connected by weave are unable to communicate after applying Ubuntu OS updates. When a container using weave tries to access the network (proably only DNS), the
weave
container crashes.How to reproduce it?
apt-get update && apt-get -y upgrade && apt-get install docker.io
curl -L git.io/weave -o /usr/local/bin/weave && sudo chmod a+x /usr/local/bin/weave
weave launch
eval $(weave env)
And I'm kicked out of my container. This worked fine before the
apt-get upgrade
, and it still works ok if weave isn't involved (if I omiteval $(weave env)
).Anything else we need to know?
The problem seems to be tied to an update to the Ubuntu systemd package 245.4-4ubuntu3.3 that was published on 2020-11-04. I've experienced it on Ubuntu 20.04 and 20.10. This version of systemd generates a line in /etc/resolv.conf which reads
options edns0 trust-ad
. The previous version (245.4-4ubuntu3) only generated the lineoptions edns0
, without thetrust-ad
. The new option triggers a bug in miekg/dns that was fixed a few years back: https://github.com/miekg/dns/commit/906238edc6eb0ddface4a1923f6d41ef2a5ca59bI've tried removing trust-ad from resolv.conf and it does fix the crash on a simple test vm. On my "real" vms where I was using weave, containers were still unable to ping each other after getting rid of the crash, but that may be an unrelated problem.
Versions:
Logs: