project-iris / iris

Decentralized cloud messaging
iris.karalabe.com
Other
571 stars 32 forks source link

Bootstrap probe panic #46

Closed chendo closed 9 years ago

chendo commented 9 years ago

Iris panics when trying to probe.

eth0 details:

          inet addr:10.240.145.75  Bcast:10.240.145.75  Mask:255.255.255.255
          inet6 addr: fe80::4001:aff:fef0:914b/64 Scope:Link
$ ./iris -dev
Entering developer mode
Generating random RSA key... done.
Generating random cluster name... done.

2014/08/20 04:40:52 main: booting iris overlay...
2014/08/20 04:40:52 scribe: booting with id 1021648966721.
panic: invalid argument to Intn

goroutine 30 [running]:
runtime.panic(0x5ad400, 0xc208001250)
    /usr/local/go/src/pkg/runtime/panic.c:279 +0xf5
math/rand.(*Rand).Intn(0xc2080001e0, 0xffffffffffffffff, 0x0)
    /usr/local/go/src/pkg/math/rand/rand.go:95 +0x71
math/rand.Intn(0xffffffffffffffff, 0x10)
    /usr/local/go/src/pkg/math/rand/rand.go:195 +0x34
github.com/project-iris/iris/proto/bootstrap.(*Bootstrapper).probe(0xc208046480)
    /go/src/github.com/project-iris/iris/proto/bootstrap/bootstrap.go:242 +0x1b5
created by github.com/project-iris/iris/proto/bootstrap.(*Bootstrapper).Boot
    /go/src/github.com/project-iris/iris/proto/bootstrap/bootstrap.go:142 +0x78

goroutine 16 [semacquire]:
sync.runtime_Semacquire(0xc208000788)
    /usr/local/go/src/pkg/runtime/sema.goc:199 +0x30
sync.(*WaitGroup).Wait(0xc2080960c8)
    /usr/local/go/src/pkg/sync/waitgroup.go:129 +0x14b
github.com/project-iris/iris/proto/pastry.(*Overlay).Boot(0xc208096000, 0x0, 0x0, 0x0)
    /go/src/github.com/project-iris/iris/proto/pastry/overlay.go:168 +0x426
github.com/project-iris/iris/proto/scribe.(*Overlay).Boot(0xc2080a16d0, 0xc2080009e8, 0x0, 0x0)
    /go/src/github.com/project-iris/iris/proto/scribe/overlay.go:81 +0xe5
github.com/project-iris/iris/proto/iris.(*Overlay).Boot(0xc2080987e0, 0x1f, 0x0, 0x0)
    /go/src/github.com/project-iris/iris/proto/iris/overlay.go:64 +0x50
main.main()
    /go/src/github.com/project-iris/iris/main.go:177 +0x522

goroutine 19 [finalizer wait]:
runtime.park(0x418600, 0x7852e8, 0x783dc9)
    /usr/local/go/src/pkg/runtime/proc.c:1369 +0x89
runtime.parkunlock(0x7852e8, 0x783dc9)
    /usr/local/go/src/pkg/runtime/proc.c:1385 +0x3b
runfinq()
    /usr/local/go/src/pkg/runtime/mgc0.c:2644 +0xcf
runtime.goexit()
    /usr/local/go/src/pkg/runtime/proc.c:1445

goroutine 21 [chan receive]:
github.com/project-iris/iris/system.func·001()
    /go/src/github.com/project-iris/iris/system/system.go:56 +0x54
created by github.com/project-iris/iris/system.init·1
    /go/src/github.com/project-iris/iris/system/system.go:59 +0x3d

goroutine 22 [syscall]:
os/signal.loop()
    /usr/local/go/src/pkg/os/signal/signal_unix.go:21 +0x1e
created by os/signal.init·1
    /usr/local/go/src/pkg/os/signal/signal_unix.go:27 +0x32

goroutine 23 [select]:
github.com/project-iris/iris/heart.(*Heart).beater(0xc20801a910)
    /go/src/github.com/project-iris/iris/heart/heart.go:131 +0x46b
created by github.com/project-iris/iris/heart.(*Heart).Start
    /go/src/github.com/project-iris/iris/heart/heart.go:63 +0x2f

goroutine 24 [select]:
github.com/project-iris/iris/proto/pastry.(*Overlay).acceptor(0xc208096000, 0xc2080a23c0, 0xc2080044e0)
    /go/src/github.com/project-iris/iris/proto/pastry/handshake.go:81 +0xd4a
created by github.com/project-iris/iris/proto/pastry.(*Overlay).Boot
    /go/src/github.com/project-iris/iris/proto/pastry/overlay.go:155 +0x339

goroutine 25 [select]:
github.com/project-iris/iris/proto/pastry.(*Overlay).manager(0xc208096000)
    /go/src/github.com/project-iris/iris/proto/pastry/maintenance.go:76 +0x1006
created by github.com/project-iris/iris/proto/pastry.(*Overlay).Boot
    /go/src/github.com/project-iris/iris/proto/pastry/overlay.go:160 +0x3a3

goroutine 26 [select]:
github.com/project-iris/iris/heart.(*Heart).beater(0xc20801a8c0)
    /go/src/github.com/project-iris/iris/heart/heart.go:131 +0x46b
created by github.com/project-iris/iris/heart.(*Heart).Start
    /go/src/github.com/project-iris/iris/heart/heart.go:63 +0x2f

goroutine 27 [IO wait]:
net.runtime_pollWait(0x7f42c7915580, 0x72, 0x0)
    /usr/local/go/src/pkg/runtime/netpoll.goc:146 +0x66
net.(*pollDesc).Wait(0xc208098220, 0x72, 0x0, 0x0)
    /usr/local/go/src/pkg/net/fd_poll_runtime.go:84 +0x46
net.(*pollDesc).WaitRead(0xc208098220, 0x0, 0x0)
    /usr/local/go/src/pkg/net/fd_poll_runtime.go:89 +0x42
net.(*netFD).accept(0xc2080981c0, 0x6b2d60, 0x0, 0x7f42c79123c8, 0xb)
    /usr/local/go/src/pkg/net/fd_unix.go:409 +0x343
net.(*TCPListener).AcceptTCP(0xc20803c030, 0xecb861dd4, 0x0, 0x0)
    /usr/local/go/src/pkg/net/tcpsock_posix.go:234 +0x5d
github.com/project-iris/iris/proto/stream.(*Listener).accepter(0xc2080a4380, 0x3b9aca00)
    /go/src/github.com/project-iris/iris/proto/stream/stream.go:98 +0x241
created by github.com/project-iris/iris/proto/stream.(*Listener).Accept
    /go/src/github.com/project-iris/iris/proto/stream/stream.go:74 +0x39

goroutine 28 [select]:
github.com/project-iris/iris/proto/session.(*Listener).accepter(0xc208004600, 0x3b9aca00)
    /go/src/github.com/project-iris/iris/proto/session/handshake.go:128 +0x42b
created by github.com/project-iris/iris/proto/session.(*Listener).Accept
    /go/src/github.com/project-iris/iris/proto/session/handshake.go:109 +0x58

goroutine 29 [IO wait]:
net.runtime_pollWait(0x7f42c79154d0, 0x72, 0x0)
    /usr/local/go/src/pkg/runtime/netpoll.goc:146 +0x66
net.(*pollDesc).Wait(0xc208098290, 0x72, 0x0, 0x0)
    /usr/local/go/src/pkg/net/fd_poll_runtime.go:84 +0x46
net.(*pollDesc).WaitRead(0xc208098290, 0x0, 0x0)
    /usr/local/go/src/pkg/net/fd_poll_runtime.go:89 +0x42
net.(*netFD).readFrom(0xc208098230, 0x7f42c4e638a4, 0x5dc, 0x5dc, 0x0, 0x0, 0x0, 0x7f42c79123c8, 0xb)
    /usr/local/go/src/pkg/net/fd_unix.go:259 +0x3db
net.(*UDPConn).ReadFromUDP(0xc20803c038, 0x7f42c4e638a4, 0x5dc, 0x5dc, 0x0, 0x0, 0x0, 0x0)
    /usr/local/go/src/pkg/net/udpsock_posix.go:67 +0x129
github.com/project-iris/iris/proto/bootstrap.(*Bootstrapper).accept(0xc208046480)
    /go/src/github.com/project-iris/iris/proto/bootstrap/bootstrap.go:198 +0x24a
created by github.com/project-iris/iris/proto/bootstrap.(*Bootstrapper).Boot
    /go/src/github.com/project-iris/iris/proto/bootstrap/bootstrap.go:141 +0x60

goroutine 31 [runnable]:
github.com/project-iris/iris/proto/bootstrap.(*Bootstrapper).scan(0xc208046480)
    /go/src/github.com/project-iris/iris/proto/bootstrap/bootstrap.go:279
created by github.com/project-iris/iris/proto/bootstrap.(*Bootstrapper).Boot
    /go/src/github.com/project-iris/iris/proto/bootstrap/bootstrap.go:143 +0x90
chendo commented 9 years ago

On further investigation, this is due to Google Cloud assigning us as a /32 as they don't support broadcast/multicast on their network, so rand.Intn is being passed -1.

The ability to override the subnet mask should fix this issue

karalabe commented 9 years ago

Hmm, why would GCE assign a /32 mask? Is it some special network setup that you've configured? And if so, may I ask the rational behind it? I could try and figure out a workaround, but it seems to me a very strange use case.

chendo commented 9 years ago

I believe it's meant to prevent you from using broadcast/multicast, rather than silently not work. I don't believe we're using a special setup, just a standard GCE VM. I found it strange to be honest, but I guess it's better to fail noisily rather than silently.

karalabe commented 9 years ago

I've been using GCE quite a lot a while back, but then the default net mask was I believe /8 (that's the reason I found the /32 a bit strange).

However you are completely right, that Iris should at least display a proper error message and not just panic, so I'll definitely need to fix this. I'm not sure I would like to support configuring custom net masks as it would beat the current zero configuration "promise" (it's not that simple... what happens when multiple networks are present?).

Nonetheless I will investigate this /32 subnet issue, as if it's something newly introduced into GCE, that I'd definitely want to know about it and its rationale.

phifty commented 9 years ago

I had the same problem on a machine with multiple interfaces. One had a /32 subnet and causes the panic. For that situation where the broadcast can be done via another interface, a warning would be good enough for me.

karalabe commented 9 years ago

Yes, I agree that the panic should be sorted out. I'm just finishing up the JVM binding, after which I'll sort this out asap :)... but in the mean time, if you feel up to it, you could write up a small pull request to ignore /32 subnets ;)

phifty commented 9 years ago

Well, I've created a pull request for a fix of that issue.

karalabe commented 9 years ago

The panic is gone now, but I'm not exactly sure whether I should allow the bootstrapper to be started at all on such addresses (<2 bit host part). They are used primarily by bridges (/32) or point-to-point connections (/31), and in neither case should the bootstrapper have anything to look for. For now it issues a warning and stops probing, but maybe I'll also add a check into an upper layer to never even start a bootstrapper in the first place.