Closed fastcat closed 5 years ago
Unfortunately, it seems to run just fine in a regular stretch VM. Just spun up a Hyper-V VM and no segfault.
Unfortunately we don't have a ChromeOS machine.. And today is the first I've ever heard of "Crostini"
I have this same issue just now on ubuntu 18.04 inside an LXD container. It seems to run OK as root on the host. It has been running fine for a few years until today.
Distributor ID: Ubuntu Description: Ubuntu 18.04.3 LTS Release: 18.04 Codename: bionic
zerotier-one 1.4.2 amd64
systemctl status zerotier-one zerotier-one.service - ZeroTier One Loaded: loaded (/lib/systemd/system/zerotier-one.service; enabled; vendor preset: enabled) Active: failed (Result: signal) since Tue 2019-08-13 04:52:06 UTC; 10min ago Process: 776 ExecStart=/usr/sbin/zerotier-one (code=killed, signal=SEGV) Main PID: 776 (code=killed, signal=SEGV)
🤷♂️
Just for kicks, rm -rf /var/lib/zerotier-one/peers.d/*.peer
and then systemctl start zerotier-one
No idea how this is happening for you guys right now
@glimberg Does your build system perhaps save aside debug symbols for the release builds? If I could load that into GDB I'd happily get a stack trace for you.
Are the different packages for different debian/ish releases actually different, or are they all the same statically linked binary inside? I'm wondering if there could, perhaps, be an issue with having statically linked an newer libc running on an older host, or vice-versa.
Any particular reason why the binaries are statically linked?
As far as cleaning up /var/lib/zerotier-one
-- I tried (with a backup handy) removing that entire directory and re-installing the package, and it still crashes at startup before writing any files into the dir at all (beyond the three symlinks to the zt binaries that I think the package install scripts put there).
It seemed to start up on a host system that is not a container. I just built the previous official 1.4.0.1 release from source and put it in the container. It all works again. It seems this is a regression to do with some permission changes.
ps: https://github.com/zerotier/ZeroTierOne/issues/1006 is probably the same.
I just built the 1.4.2 release from here https://github.com/zerotier/ZeroTierOne/releases (With symbols) and copied the executable image to my container and it works fine. Maybe a packaging issue?
I am also getting the segvs on startup. CentOS 7 (fully patched to current) on both bare metal and VM guests.
Removing all the cached peer files as suggested in an earlier post did not fix this.
CentOS 7 get segmentation fault when I upgraded to latest version. It's happened to both zerotier-one and zerotier-cli.
Running in GCP
Zerotier 1.4.2 on CentOS 7 on GCP is segfaulting on startup for us.
It runs great on mac and raspberry pi 4 natively, but looks like it can fail in vm: to make it more reproducible, this is the vagrant file for virtualbox: guest is fedora 30, host is macos 10.14.6:
# -*- mode: ruby -*-
# vi: set ft=ruby :
# All Vagrant configuration is done below. The "2" in Vagrant.configure
# configures the configuration version (we support older styles for
# backwards compatibility). Please don't change it unless you know what
# you're doing.
Vagrant.configure("2") do |config|
# The most common configuration options are documented and commented below.
# For a complete reference, please see the online documentation at
# https://docs.vagrantup.com.
# Every Vagrant development environment requires a box. You can search for
# boxes at https://vagrantcloud.com/search.
config.vm.box = "fedora/30-cloud-base"
config.vm.box_version = "30.20190425.0"
# Disable automatic box update checking. If you disable this, then
# boxes will only be checked for updates when the user runs
# `vagrant box outdated`. This is not recommended.
# config.vm.box_check_update = false
# Create a forwarded port mapping which allows access to a specific port
# within the machine from a port on the host machine. In the example below,
# accessing "localhost:8080" will access port 80 on the guest machine.
# NOTE: This will enable public access to the opened port
# config.vm.network "forwarded_port", guest: 80, host: 8080
# Create a forwarded port mapping which allows access to a specific port
# within the machine from a port on the host machine and only allow access
# via 127.0.0.1 to disable public access
config.vm.network "forwarded_port", guest: 51280, host: 8080, host_ip: "127.0.0.1"
# Create a private network, which allows host-only access to the machine
# using a specific IP.
# config.vm.network "private_network", ip: "192.168.33.10"
# Create a public network, which generally matched to bridged network.
# Bridged networks make the machine appear as another physical device on
# your network.
# config.vm.network "public_network"
# Share an additional folder to the guest VM. The first argument is
# the path on the host to the actual folder. The second argument is
# the path on the guest to mount the folder. And the optional third
# argument is a set of non-required options.
# config.vm.synced_folder "../data", "/vagrant_data"
# Provider-specific configuration so you can fine-tune various
# backing providers for Vagrant. These expose provider-specific options.
# Example for VirtualBox:
#
config.vm.provider "virtualbox" do |vb|
# # Display the VirtualBox GUI when booting the machine
# vb.gui = true
#
# # Customize the amount of memory on the VM:
vb.memory = "2048"
end
#
# View the documentation for the provider you are using for more
# information on available options.
# Enable provisioning with a shell script. Additional provisioners such as
# Puppet, Chef, Ansible, Salt, and Docker are also available. Please see the
# documentation for more information about their specific syntax and use.
# config.vm.provision "shell", inline: <<-SHELL
# apt-get update
# apt-get install -y apache2
# SHELL
end
fails with:
sudo systemctl status zerotier-one ● zerotier-one.service - ZeroTier One Loaded: loaded (/usr/lib/systemd/system/zerotier-one.service; enabled; vendor preset: disabled) Active: failed (Result: signal) since Tue 2019-08-13 13:20:55 UTC; 968ms ago Process: 9987 ExecStart=/usr/sbin/zerotier-one (code=killed, signal=SEGV) Main PID: 9987 (code=killed, signal=SEGV)
Aug 13 13:20:55 localhost.localdomain systemd[1]: zerotier-one.service: Service RestartSec=100ms expired, scheduling restart. Aug 13 13:20:55 localhost.localdomain systemd[1]: zerotier-one.service: Scheduled restart job, restart counter is at 5. Aug 13 13:20:55 localhost.localdomain systemd[1]: Stopped ZeroTier One. Aug 13 13:20:55 localhost.localdomain systemd[1]: zerotier-one.service: Start request repeated too quickly. Aug 13 13:20:55 localhost.localdomain systemd[1]: zerotier-one.service: Failed with result 'signal'. Aug 13 13:20:55 localhost.localdomain systemd[1]: Failed to start ZeroTier One.
I also saw this on a CentOS 7.6.1810 x64 system. For me, it was caused by selinux.
This was the error: type=AVC msg=audit(...): avc: denied { mmap_zero } for pid=... comm="zerotier-one" scontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 tcontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 tclass=memprotect permissive=0
This is a workaround for now: setsebool -P mmap_low_allowed=true
Would it be possible for the zerotier client to emit a meaningful error if the mmap call fails to assist users with diagnosing this issue should it arise?
Ahh yes, the horror of Linux builds. Nothing much changed. I'll check and see if we can build natively in stretch instead of using the static binaries and perhaps rebuild and update the repos.
We didn't add any mmap calls in our code, so will have to check upstream libraries.
Cranking off a new build today.
New build is in progress... takes a long time. :)
Closing since new builds are up and SHOULD fix this. (They do in our testing.)
We moved away from one-size-fits-all static binaries on all distributions with compilers available that are new enough to build this, which means GCC/G++ 5.x or newer in most cases. For CentOS we used the latest SCL dev toolchain. CentOS 6 and CentOS 7 for i686, s390x, and armhf (32-bit ARM) continue to use static binaries since these don't have SCL devtools packages, but AARCH64 and X86_64 do and these are the ones you're going to find in data centers where container use is likely.
I saw the selinux deny on both Centos 7.6 and Fedora30. I did not whitelist the sebool either. I can confirm after updating to the new package zerotier-one-1.4.2-2.el7.x86_64 on both systems the service starts correctly, and zerotier-cli now runs as expected. Thanks for quickly addressing this.
Confirmed the new builds fix the problem for me at least. Thank you :)
@adamierymenko Thanks for addressing the bugs of late quickly. Trying to update https://github.com/zerotier/ZeroTierOne/blob/master/ext/installfiles/linux/zerotier-containerized/Dockerfile , but it doesn't appear to generate a working binary with 1.4.2. Any suggested changes?
I see this also, unless I run as root or with sudo, on
~$ uname -a
Linux putin 5.15.0-67-generic #74-Ubuntu SMP Wed Feb 22 14:14:39 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
~$ sudo zerotier-cli
ZeroTier One version 1.10.3 build 0 (platform 1 arch 2)
~$ zerotier-cli
Segmentation fault
Describe the bug
zerotier-one
1.4.2 dies withSIGSEGV
at startup in Debian Stretch builds. 1.4.0 Debian Stretch build is not affected. 1.4.2 Debian Buster build is not affected.To Reproduce Steps to reproduce the behavior:
systemctl status zerotier-one
⇒ died withSEGV
sudo zerotier-one
⇒ immediately dies withSegmentation fault
gdb
⇒ Statically linked with no symbols, GDB just saysDuring startup program terminated with signal SIGSEGV, Segmentation fault.
Expected behavior It should run :)
Desktop (please complete the following information):