zerotier / ZeroTierOne

A Smart Ethernet Switch for Earth
https://zerotier.com
Other
14.22k stars 1.66k forks source link

1.4.2 static binary builds don't work in some container environments (LXD, OpenVZ), new build in progress #1007

Closed fastcat closed 5 years ago

fastcat commented 5 years ago

Describe the bug zerotier-one 1.4.2 dies with SIGSEGV at startup in Debian Stretch builds. 1.4.0 Debian Stretch build is not affected. 1.4.2 Debian Buster build is not affected.

To Reproduce Steps to reproduce the behavior:

  1. Upgrade a Debian stretch system from 1.4.0 to 1.4.2
  2. systemctl status zerotier-one ⇒ died with SEGV
  3. Try to start it by hand: sudo zerotier-one ⇒ immediately dies with Segmentation fault
  4. Try to be helpful and get a backtrace with gdb ⇒ Statically linked with no symbols, GDB just says During startup program terminated with signal SIGSEGV, Segmentation fault.

Expected behavior It should run :)

Desktop (please complete the following information):

glimberg commented 5 years ago

Unfortunately, it seems to run just fine in a regular stretch VM. Just spun up a Hyper-V VM and no segfault.

Unfortunately we don't have a ChromeOS machine.. And today is the first I've ever heard of "Crostini"

image

mianos commented 5 years ago

I have this same issue just now on ubuntu 18.04 inside an LXD container. It seems to run OK as root on the host. It has been running fine for a few years until today.

Distributor ID: Ubuntu Description: Ubuntu 18.04.3 LTS Release: 18.04 Codename: bionic

zerotier-one 1.4.2 amd64

systemctl status zerotier-one zerotier-one.service - ZeroTier One Loaded: loaded (/lib/systemd/system/zerotier-one.service; enabled; vendor preset: enabled) Active: failed (Result: signal) since Tue 2019-08-13 04:52:06 UTC; 10min ago Process: 776 ExecStart=/usr/sbin/zerotier-one (code=killed, signal=SEGV) Main PID: 776 (code=killed, signal=SEGV)

glimberg commented 5 years ago

🤷‍♂️ image

Just for kicks, rm -rf /var/lib/zerotier-one/peers.d/*.peer and then systemctl start zerotier-one

No idea how this is happening for you guys right now

fastcat commented 5 years ago

@glimberg Does your build system perhaps save aside debug symbols for the release builds? If I could load that into GDB I'd happily get a stack trace for you.

Are the different packages for different debian/ish releases actually different, or are they all the same statically linked binary inside? I'm wondering if there could, perhaps, be an issue with having statically linked an newer libc running on an older host, or vice-versa.

Any particular reason why the binaries are statically linked?

As far as cleaning up /var/lib/zerotier-one -- I tried (with a backup handy) removing that entire directory and re-installing the package, and it still crashes at startup before writing any files into the dir at all (beyond the three symlinks to the zt binaries that I think the package install scripts put there).

mianos commented 5 years ago

It seemed to start up on a host system that is not a container. I just built the previous official 1.4.0.1 release from source and put it in the container. It all works again. It seems this is a regression to do with some permission changes.

mianos commented 5 years ago

ps: https://github.com/zerotier/ZeroTierOne/issues/1006 is probably the same.

mianos commented 5 years ago

I just built the 1.4.2 release from here https://github.com/zerotier/ZeroTierOne/releases (With symbols) and copied the executable image to my container and it works fine. Maybe a packaging issue?

cwichura commented 5 years ago

I am also getting the segvs on startup. CentOS 7 (fully patched to current) on both bare metal and VM guests.

Removing all the cached peer files as suggested in an earlier post did not fix this.

5aaee9 commented 5 years ago

CentOS 7 get segmentation fault when I upgraded to latest version. It's happened to both zerotier-one and zerotier-cli.

Running in GCP

wouterh-dev commented 5 years ago

Zerotier 1.4.2 on CentOS 7 on GCP is segfaulting on startup for us.

daviehh commented 5 years ago

It runs great on mac and raspberry pi 4 natively, but looks like it can fail in vm: to make it more reproducible, this is the vagrant file for virtualbox: guest is fedora 30, host is macos 10.14.6:

# -*- mode: ruby -*-
# vi: set ft=ruby :

# All Vagrant configuration is done below. The "2" in Vagrant.configure
# configures the configuration version (we support older styles for
# backwards compatibility). Please don't change it unless you know what
# you're doing.
Vagrant.configure("2") do |config|
  # The most common configuration options are documented and commented below.
  # For a complete reference, please see the online documentation at
  # https://docs.vagrantup.com.

  # Every Vagrant development environment requires a box. You can search for
  # boxes at https://vagrantcloud.com/search.
  config.vm.box = "fedora/30-cloud-base"
  config.vm.box_version = "30.20190425.0"

  # Disable automatic box update checking. If you disable this, then
  # boxes will only be checked for updates when the user runs
  # `vagrant box outdated`. This is not recommended.
  # config.vm.box_check_update = false

  # Create a forwarded port mapping which allows access to a specific port
  # within the machine from a port on the host machine. In the example below,
  # accessing "localhost:8080" will access port 80 on the guest machine.
  # NOTE: This will enable public access to the opened port
  # config.vm.network "forwarded_port", guest: 80, host: 8080

  # Create a forwarded port mapping which allows access to a specific port
  # within the machine from a port on the host machine and only allow access
  # via 127.0.0.1 to disable public access
  config.vm.network "forwarded_port", guest: 51280, host: 8080, host_ip: "127.0.0.1"

  # Create a private network, which allows host-only access to the machine
  # using a specific IP.
  # config.vm.network "private_network", ip: "192.168.33.10"

  # Create a public network, which generally matched to bridged network.
  # Bridged networks make the machine appear as another physical device on
  # your network.
  # config.vm.network "public_network"

  # Share an additional folder to the guest VM. The first argument is
  # the path on the host to the actual folder. The second argument is
  # the path on the guest to mount the folder. And the optional third
  # argument is a set of non-required options.
  # config.vm.synced_folder "../data", "/vagrant_data"

  # Provider-specific configuration so you can fine-tune various
  # backing providers for Vagrant. These expose provider-specific options.
  # Example for VirtualBox:
  #
  config.vm.provider "virtualbox" do |vb|
  #   # Display the VirtualBox GUI when booting the machine
  #   vb.gui = true
  #
  #   # Customize the amount of memory on the VM:
    vb.memory = "2048"
  end
  #
  # View the documentation for the provider you are using for more
  # information on available options.

  # Enable provisioning with a shell script. Additional provisioners such as
  # Puppet, Chef, Ansible, Salt, and Docker are also available. Please see the
  # documentation for more information about their specific syntax and use.
  # config.vm.provision "shell", inline: <<-SHELL
  #   apt-get update
  #   apt-get install -y apache2
  # SHELL
end

fails with:

sudo systemctl status zerotier-one ● zerotier-one.service - ZeroTier One Loaded: loaded (/usr/lib/systemd/system/zerotier-one.service; enabled; vendor preset: disabled) Active: failed (Result: signal) since Tue 2019-08-13 13:20:55 UTC; 968ms ago Process: 9987 ExecStart=/usr/sbin/zerotier-one (code=killed, signal=SEGV) Main PID: 9987 (code=killed, signal=SEGV)

Aug 13 13:20:55 localhost.localdomain systemd[1]: zerotier-one.service: Service RestartSec=100ms expired, scheduling restart. Aug 13 13:20:55 localhost.localdomain systemd[1]: zerotier-one.service: Scheduled restart job, restart counter is at 5. Aug 13 13:20:55 localhost.localdomain systemd[1]: Stopped ZeroTier One. Aug 13 13:20:55 localhost.localdomain systemd[1]: zerotier-one.service: Start request repeated too quickly. Aug 13 13:20:55 localhost.localdomain systemd[1]: zerotier-one.service: Failed with result 'signal'. Aug 13 13:20:55 localhost.localdomain systemd[1]: Failed to start ZeroTier One.

ctr commented 5 years ago

I also saw this on a CentOS 7.6.1810 x64 system. For me, it was caused by selinux.

This was the error: type=AVC msg=audit(...): avc: denied { mmap_zero } for pid=... comm="zerotier-one" scontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 tcontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 tclass=memprotect permissive=0

This is a workaround for now: setsebool -P mmap_low_allowed=true

Would it be possible for the zerotier client to emit a meaningful error if the mmap call fails to assist users with diagnosing this issue should it arise?

adamierymenko commented 5 years ago

Ahh yes, the horror of Linux builds. Nothing much changed. I'll check and see if we can build natively in stretch instead of using the static binaries and perhaps rebuild and update the repos.

adamierymenko commented 5 years ago

We didn't add any mmap calls in our code, so will have to check upstream libraries.

adamierymenko commented 5 years ago

1006 is a dupe

Cranking off a new build today.

adamierymenko commented 5 years ago

New build is in progress... takes a long time. :)

adamierymenko commented 5 years ago

Closing since new builds are up and SHOULD fix this. (They do in our testing.)

We moved away from one-size-fits-all static binaries on all distributions with compilers available that are new enough to build this, which means GCC/G++ 5.x or newer in most cases. For CentOS we used the latest SCL dev toolchain. CentOS 6 and CentOS 7 for i686, s390x, and armhf (32-bit ARM) continue to use static binaries since these don't have SCL devtools packages, but AARCH64 and X86_64 do and these are the ones you're going to find in data centers where container use is likely.

tardfree commented 5 years ago

I saw the selinux deny on both Centos 7.6 and Fedora30. I did not whitelist the sebool either. I can confirm after updating to the new package zerotier-one-1.4.2-2.el7.x86_64 on both systems the service starts correctly, and zerotier-cli now runs as expected. Thanks for quickly addressing this.

fastcat commented 5 years ago

Confirmed the new builds fix the problem for me at least. Thank you :)

unquietwiki commented 5 years ago

@adamierymenko Thanks for addressing the bugs of late quickly. Trying to update https://github.com/zerotier/ZeroTierOne/blob/master/ext/installfiles/linux/zerotier-containerized/Dockerfile , but it doesn't appear to generate a working binary with 1.4.2. Any suggested changes?

tomachinz commented 1 year ago

I see this also, unless I run as root or with sudo, on

~$ uname -a
Linux putin 5.15.0-67-generic #74-Ubuntu SMP Wed Feb 22 14:14:39 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
~$ sudo zerotier-cli 
ZeroTier One version 1.10.3 build 0 (platform 1 arch 2)
~$ zerotier-cli
Segmentation fault