fastcat commented 5 years ago

Describe the bug zerotier-one 1.4.2 dies with SIGSEGV at startup in Debian Stretch builds. 1.4.0 Debian Stretch build is not affected. 1.4.2 Debian Buster build is not affected.

To Reproduce Steps to reproduce the behavior:

Upgrade a Debian stretch system from 1.4.0 to 1.4.2
systemctl status zerotier-one ⇒ died with SEGV
Try to start it by hand: sudo zerotier-one ⇒ immediately dies with Segmentation fault
Try to be helpful and get a backtrace with gdb ⇒ Statically linked with no symbols, GDB just says During startup program terminated with signal SIGSEGV, Segmentation fault.

Expected behavior It should run :)

Desktop (please complete the following information):

OS: Debian
OS/Distribution Version: Stretch (9)
ZeroTier Version: 1.4.2
Hardware: Crostini (Linux VM in ChromeOS, x86_64 in this case)

glimberg commented 5 years ago

Unfortunately, it seems to run just fine in a regular stretch VM. Just spun up a Hyper-V VM and no segfault.

Unfortunately we don't have a ChromeOS machine.. And today is the first I've ever heard of "Crostini"

mianos commented 5 years ago

I have this same issue just now on ubuntu 18.04 inside an LXD container. It seems to run OK as root on the host. It has been running fine for a few years until today.

Distributor ID: Ubuntu Description: Ubuntu 18.04.3 LTS Release: 18.04 Codename: bionic

zerotier-one 1.4.2 amd64

systemctl status zerotier-one zerotier-one.service - ZeroTier One Loaded: loaded (/lib/systemd/system/zerotier-one.service; enabled; vendor preset: enabled) Active: failed (Result: signal) since Tue 2019-08-13 04:52:06 UTC; 10min ago Process: 776 ExecStart=/usr/sbin/zerotier-one (code=killed, signal=SEGV) Main PID: 776 (code=killed, signal=SEGV)

glimberg commented 5 years ago

🤷‍♂️

~~Just for kicks, rm -rf /var/lib/zerotier-one/peers.d/*.peer and then systemctl start zerotier-one~~

No idea how this is happening for you guys right now

fastcat commented 5 years ago

@glimberg Does your build system perhaps save aside debug symbols for the release builds? If I could load that into GDB I'd happily get a stack trace for you.

Are the different packages for different debian/ish releases actually different, or are they all the same statically linked binary inside? I'm wondering if there could, perhaps, be an issue with having statically linked an newer libc running on an older host, or vice-versa.

Any particular reason why the binaries are statically linked?

As far as cleaning up /var/lib/zerotier-one -- I tried (with a backup handy) removing that entire directory and re-installing the package, and it still crashes at startup before writing any files into the dir at all (beyond the three symlinks to the zt binaries that I think the package install scripts put there).

mianos commented 5 years ago

It seemed to start up on a host system that is not a container. I just built the previous official 1.4.0.1 release from source and put it in the container. It all works again. It seems this is a regression to do with some permission changes.

mianos commented 5 years ago

ps: https://github.com/zerotier/ZeroTierOne/issues/1006 is probably the same.

mianos commented 5 years ago

I just built the 1.4.2 release from here https://github.com/zerotier/ZeroTierOne/releases (With symbols) and copied the executable image to my container and it works fine. Maybe a packaging issue?

cwichura commented 5 years ago

I am also getting the segvs on startup. CentOS 7 (fully patched to current) on both bare metal and VM guests.

Removing all the cached peer files as suggested in an earlier post did not fix this.

5aaee9 commented 5 years ago

CentOS 7 get segmentation fault when I upgraded to latest version. It's happened to both zerotier-one and zerotier-cli.

Running in GCP

wouterh-dev commented 5 years ago

Zerotier 1.4.2 on CentOS 7 on GCP is segfaulting on startup for us.

daviehh commented 5 years ago

It runs great on mac and raspberry pi 4 natively, but looks like it can fail in vm: to make it more reproducible, this is the vagrant file for virtualbox: guest is fedora 30, host is macos 10.14.6:

# -*- mode: ruby -*-
# vi: set ft=ruby :

# All Vagrant configuration is done below. The "2" in Vagrant.configure
# configures the configuration version (we support older styles for
# backwards compatibility). Please don't change it unless you know what
# you're doing.
Vagrant.configure("2") do |config|
  # The most common configuration options are documented and commented below.
  # For a complete reference, please see the online documentation at
  # https://docs.vagrantup.com.

  # Every Vagrant development environment requires a box. You can search for
  # boxes at https://vagrantcloud.com/search.
  config.vm.box = "fedora/30-cloud-base"
  config.vm.box_version = "30.20190425.0"

  # Disable automatic box update checking. If you disable this, then
  # boxes will only be checked for updates when the user runs
  # `vagrant box outdated`. This is not recommended.
  # config.vm.box_check_update = false

  # Create a forwarded port mapping which allows access to a specific port
  # within the machine from a port on the host machine. In the example below,
  # accessing "localhost:8080" will access port 80 on the guest machine.
  # NOTE: This will enable public access to the opened port
  # config.vm.network "forwarded_port", guest: 80, host: 8080

  # Create a forwarded port mapping which allows access to a specific port
  # within the machine from a port on the host machine and only allow access
  # via 127.0.0.1 to disable public access
  config.vm.network "forwarded_port", guest: 51280, host: 8080, host_ip: "127.0.0.1"

  # Create a private network, which allows host-only access to the machine
  # using a specific IP.
  # config.vm.network "private_network", ip: "192.168.33.10"

  # Create a public network, which generally matched to bridged network.
  # Bridged networks make the machine appear as another physical device on
  # your network.
  # config.vm.network "public_network"

  # Share an additional folder to the guest VM. The first argument is
  # the path on the host to the actual folder. The second argument is
  # the path on the guest to mount the folder. And the optional third
  # argument is a set of non-required options.
  # config.vm.synced_folder "../data", "/vagrant_data"

  # Provider-specific configuration so you can fine-tune various
  # backing providers for Vagrant. These expose provider-specific options.
  # Example for VirtualBox:
  #
  config.vm.provider "virtualbox" do |vb|
  #   # Display the VirtualBox GUI when booting the machine
  #   vb.gui = true
  #
  #   # Customize the amount of memory on the VM:
    vb.memory = "2048"
  end
  #
  # View the documentation for the provider you are using for more
  # information on available options.

  # Enable provisioning with a shell script. Additional provisioners such as
  # Puppet, Chef, Ansible, Salt, and Docker are also available. Please see the
  # documentation for more information about their specific syntax and use.
  # config.vm.provision "shell", inline: <<-SHELL
  #   apt-get update
  #   apt-get install -y apache2
  # SHELL
end

fails with:

sudo systemctl status zerotier-one ● zerotier-one.service - ZeroTier One Loaded: loaded (/usr/lib/systemd/system/zerotier-one.service; enabled; vendor preset: disabled) Active: failed (Result: signal) since Tue 2019-08-13 13:20:55 UTC; 968ms ago Process: 9987 ExecStart=/usr/sbin/zerotier-one (code=killed, signal=SEGV) Main PID: 9987 (code=killed, signal=SEGV)

Aug 13 13:20:55 localhost.localdomain systemd[1]: zerotier-one.service: Service RestartSec=100ms expired, scheduling restart. Aug 13 13:20:55 localhost.localdomain systemd[1]: zerotier-one.service: Scheduled restart job, restart counter is at 5. Aug 13 13:20:55 localhost.localdomain systemd[1]: Stopped ZeroTier One. Aug 13 13:20:55 localhost.localdomain systemd[1]: zerotier-one.service: Start request repeated too quickly. Aug 13 13:20:55 localhost.localdomain systemd[1]: zerotier-one.service: Failed with result 'signal'. Aug 13 13:20:55 localhost.localdomain systemd[1]: Failed to start ZeroTier One.

ctr commented 5 years ago

I also saw this on a CentOS 7.6.1810 x64 system. For me, it was caused by selinux.

This was the error: type=AVC msg=audit(...): avc: denied { mmap_zero } for pid=... comm="zerotier-one" scontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 tcontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 tclass=memprotect permissive=0

This is a workaround for now: setsebool -P mmap_low_allowed=true

Would it be possible for the zerotier client to emit a meaningful error if the mmap call fails to assist users with diagnosing this issue should it arise?