zerotier / ZeroTierOne

A Smart Ethernet Switch for Earth
https://zerotier.com
Other
14.65k stars 1.71k forks source link

ZeroTier hangs in controller mode #553

Closed bobbyvinon closed 7 years ago

bobbyvinon commented 7 years ago

I've built latest zerotier from github (ubuntu 16.04 with clang). It's unmodified build with default config. Only config i changed is local.conf:

{ "settings": { "primaryPort": 9991, "allowManagementFrom": [ "0.0.0.0/0" ] } }

zerotier worked some time. i've attached two clients to this controller, all worked fine. after controller server reboot it stopped working. it starts, receives incoming connections but didn't answers to them - just hang. i've removed /var/lib/zerotier and reinstalled package again. situation are the same - it worked some time and after few reboots stopped to work.

bobbyvinon commented 7 years ago

here is backtrace:

0 0x00007ff6fbb9530d in nanosleep () at ../sysdeps/unix/syscall-template.S:84

1 0x00007ff6fbbc6d54 in usleep (useconds=) at ../sysdeps/posix/usleep.c:32

2 0x00000000004bc0d4 in ZeroTier::Thread::sleep (ms=300000) at osdep/Thread.hpp:191

3 ZeroTier::PortMapperImpl::threadMain (this=) at osdep/PortMapper.cpp:292

4 0x00000000004bb576 in ZeroTier::___zt_threadMain (instance=0x7ff6fb284c80)

at osdep/Thread.hpp:114

5 0x00007ff6fbe9a6ba in start_thread (arg=0x7ff6fb288700) at pthread_create.c:333

6 0x00007ff6fbbd03dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 1 (Thread 0x7ff6fcb68740 (LWP 4304)):

0 0x00007ff6fbb9530d in nanosleep () at ../sysdeps/unix/syscall-template.S:84

1 0x00007ff6fbbc6d54 in usleep (useconds=) at ../sysdeps/posix/usleep.c:32

2 0x0000000000428a9a in ZeroTier::Thread::sleep (ms=250) at controller/../osdep/Thread.hpp:191

3 ZeroTier::JSONDB::get (this=, n=...) at controller/JSONDB.cpp:88

4 0x0000000000427dc8 in ZeroTier::JSONDB::_reload (this=, p=..., b=...) at controller/JSONDB.cpp:167

5 0x0000000000427cd6 in ZeroTier::JSONDB::_reload (this=, p=..., b=...) at controller/JSONDB.cpp:169

6 0x0000000000427cd6 in ZeroTier::JSONDB::_reload (this=, p=..., b=...) at controller/JSONDB.cpp:169

7 0x0000000000427cd6 in ZeroTier::JSONDB::_reload (this=, p=..., b=...) at controller/JSONDB.cpp:169

........

581 0x0000000000427cd6 in ZeroTier::JSONDB::_reload (this=, p=..., b=...) at controller/JSONDB.cpp:169

582 0x0000000000427cd6 in ZeroTier::JSONDB::_reload (this=, p=..., b=...) at controller/JSONDB.cpp:169

583 0x0000000000428ac4 in ZeroTier::JSONDB::get (this=, n=...) at controller/JSONDB.cpp:89

584 0x0000000000427dc8 in ZeroTier::JSONDB::_reload (this=, p=..., b=...) at controller/JSONDB.cpp:167

585 0x0000000000427cd6 in ZeroTier::JSONDB::_reload (this=, p=..., b=...) at controller/JSONDB.cpp:169

586 0x0000000000427cd6 in ZeroTier::JSONDB::_reload (this=, p=..., b=...) at controller/JSONDB.cpp:169

587 0x0000000000427cd6 in ZeroTier::JSONDB::_reload (this=, p=..., b=...) at controller/JSONDB.cpp:169

588 0x0000000000428ac4 in ZeroTier::JSONDB::get (this=, n=...) at controller/JSONDB.cpp:89

589 0x0000000000427dc8 in ZeroTier::JSONDB::_reload (this=, p=..., b=...) at controller/JSONDB.cpp:167

590 0x0000000000427cd6 in ZeroTier::JSONDB::_reload (this=, p=..., b=...) at controller/JSONDB.cpp:169

591 0x0000000000427cd6 in ZeroTier::JSONDB::_reload (this=, p=..., b=...) at controller/JSONDB.cpp:169

592 0x0000000000427cd6 in ZeroTier::JSONDB::_reload (this=, p=..., b=...) at controller/JSONDB.cpp:169

593 0x0000000000428ac4 in ZeroTier::JSONDB::get (this=, n=...) at controller/JSONDB.cpp:89

594 0x0000000000427dc8 in ZeroTier::JSONDB::_reload (this=, p=..., b=...) at controller/JSONDB.cpp:167

595 0x0000000000427cd6 in ZeroTier::JSONDB::_reload (this=, p=..., b=...) at controller/JSONDB.cpp:169

596 0x0000000000427cd6 in ZeroTier::JSONDB::_reload (this=, p=..., b=...) at controller/JSONDB.cpp:169

597 0x0000000000427cd6 in ZeroTier::JSONDB::_reload (this=, p=..., b=...) at controller/JSONDB.cpp:169

598 0x0000000000428ac4 in ZeroTier::JSONDB::get (this=, n=...) at controller/JSONDB.cpp:89

599 0x0000000000427dc8 in ZeroTier::JSONDB::_reload (this=, p=..., b=...) at controller/JSONDB.cpp:167

600 0x0000000000427cd6 in ZeroTier::JSONDB::_reload (this=, p=..., b=...) at controller/JSONDB.cpp:169

601 0x0000000000427cd6 in ZeroTier::JSONDB::_reload (this=, p=..., b=...) at controller/JSONDB.cpp:169

602 0x0000000000427cd6 in ZeroTier::JSONDB::_reload (this=, p=..., b=...) at controller/JSONDB.cpp:169

603 0x0000000000428ac4 in ZeroTier::JSONDB::get (this=, n=...) at controller/JSONDB.cpp:89

604 0x0000000000427dc8 in ZeroTier::JSONDB::_reload (this=, p=..., b=...) at controller/JSONDB.cpp:167

605 0x0000000000427cd6 in ZeroTier::JSONDB::_reload (this=, p=..., b=...) at controller/JSONDB.cpp:169

606 0x0000000000427cd6 in ZeroTier::JSONDB::_reload (this=, p=..., b=...) at controller/JSONDB.cpp:169

607 0x0000000000427cd6 in ZeroTier::JSONDB::_reload (this=, p=..., b=...) at controller/JSONDB.cpp:169

608 0x00000000004273b2 in ZeroTier::JSONDB::JSONDB (this=0x13f0120, basePath=...) at controller/JSONDB.cpp:53

609 0x00000000004056a8 in ZeroTier::EmbeddedNetworkController::EmbeddedNetworkController (this=0x13efec0,

node=<optimized out>, dbPath=0x13e9730 "/var/lib/zerotier-one/controller.d")
at controller/EmbeddedNetworkController.cpp:434

610 0x00000000004c7a85 in ZeroTier::(anonymous namespace)::OneServiceImpl::run (this=)

at service/OneService.cpp:752

611 0x00000000004dc90f in _OneServiceRunner::threadMain (this=) at one.cpp:1223

612 0x00000000004d6ea1 in main (argc=, argv=) at one.cpp:1498

zielmicha commented 7 years ago

I had the same issue. Didn't have time to debug, reverting to 1.2.2 fixed it.

bobbyvinon commented 7 years ago

i've attached config files, which json failed to parse a.zip

bobbyvinon commented 7 years ago

dirty quick fix: in function JSONDB::get comment this lines at beginning of function: while (!_ready) { Thread::sleep(250); _ready = _reload(_basePath,std::string()); }

sbilly commented 7 years ago

I have the same issue.

lflare commented 7 years ago

I also have the same issue for a long time now, will double confirm if the quickfix by @bobbyvinon works later on.

laduke commented 7 years ago

git bisect says

first bad commit: [f4feccc6265cc480b84c85f897b225714072d4ec] Do not serve controller requests until init is done.

'bad' meaning one hangs at start and has to be kill -9'd

I was testing with this in controller.d/network/54d343738795f1c8.json

{
  "authTokens": [],
  "capabilities": [],
  "creationTime": 1509777679437,
  "enableBroadcast": true,
  "id": "54d343738795f1c8",
  "ipAssignmentPools": [],
  "lastModified": 1509777679436,
  "multicastLimit": 32,
  "name": "",
  "nwid": "54d343738795f1c8",
  "objtype": "network",
  "private": true,
  "revision": 1,
  "routes": [],
  "rules": [
    {
      "not": false,
      "or": false,
      "type": "ACTION_ACCEPT"
    }
  ],
  "tags": [],
  "v4AssignMode": {
    "zt": false
  },
  "v6AssignMode": {
    "6plane": false,
    "rfc4193": false,
    "zt": false
  }
}

which was generated by starting a controller, adding a network, exiting. ( I was testing something in node)

cameronleger commented 7 years ago

I can confirm similar behaviour to @laduke. I thought I was getting configurations wrong because joined users would not receive IPv4 addresses. (Re)starting the service again results in a hang that must be kill -9'd. However, using the most basic case, where I start the service and add a network with all defaults, then restarting the service causes the hang.

So, I don't think it's anything I've configured incorrectly. How is anyone using their own controller at this state? There should be more documentation around the routes configuration, because it's necessary to setup addressing but the payload had to be guessed by reading around non-tierzero sites (which there's very little information for in any case).

cwegener commented 7 years ago

Same here. Doing a git revert f4feccc6 fixes the problem.

adamierymenko commented 7 years ago

Fixed since that code got taken out behind the barn and shot.