nictuku / stardew-rocks

Share your Stardew Valley save games and see how your farm looks like, on the web!
Apache License 2.0
60 stars 16 forks source link

subscriber getting stuck #81

Closed nictuku closed 8 years ago

nictuku commented 8 years ago

Didn't process anything or 6 days. This happened before. A restart fixes it.

This is a stack dump:


2016/06/30 12:43:23 Total messages so far: 2431
SIGQUIT: quit
PC=0x45f8d1 m=0

goroutine 0 [idle]:
runtime.futex(0xa39348, 0x0, 0x0, 0x0, 0x0, 0xa38b50, 0x0, 0x0, 0x4114a4, 0xa393                                                                                                                                   48, ...)
        /usr/local/go/src/runtime/sys_linux_amd64.s:302 +0x21
runtime.futexsleep(0xa39348, 0x0, 0xffffffffffffffff)
        /usr/local/go/src/runtime/os1_linux.go:40 +0x53
runtime.notesleep(0xa39348)
        /usr/local/go/src/runtime/lock_futex.go:145 +0xa4
runtime.stopm()
        /usr/local/go/src/runtime/proc.go:1535 +0x10b
runtime.findrunnable(0xc82001a000, 0x0)
        /usr/local/go/src/runtime/proc.go:1973 +0x739
runtime.schedule()
        /usr/local/go/src/runtime/proc.go:2072 +0x24f
runtime.park_m(0xc820130180)
        /usr/local/go/src/runtime/proc.go:2137 +0x18b
runtime.mcall(0x7ffd8f515190)
        /usr/local/go/src/runtime/asm_amd64.s:233 +0x5b

goroutine 1 [select (no cases), 13985 minutes]:
main.main()
        /home/stardew/src/github.com/nictuku/stardew-rocks/subscriber/subscribe.                                                                                                                                   go:171 +0x4e8

goroutine 17 [syscall, 13985 minutes, locked to thread]:
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:1998 +0x1

goroutine 5 [select]:
gopkg.in/mgo%2ev2.(*mongoCluster).syncServersLoop(0xc8200c4000)
        /home/stardew/src/gopkg.in/mgo.v2/cluster.go:394 +0x617
created by gopkg.in/mgo%2ev2.newCluster
        /home/stardew/src/gopkg.in/mgo.v2/cluster.go:78 +0x1d5

goroutine 11 [sleep]:
time.Sleep(0x37e11d600)
        /usr/local/go/src/runtime/time.go:59 +0xf9
gopkg.in/mgo%2ev2.(*mongoServer).pinger(0xc8200c60e0, 0x1)
        /home/stardew/src/gopkg.in/mgo.v2/server.go:301 +0x1a6
created by gopkg.in/mgo%2ev2.newServer
        /home/stardew/src/gopkg.in/mgo.v2/server.go:89 +0x165

goroutine 18 [IO wait, 9275 minutes]:
net.runtime_pollWait(0x7f67c2c0b518, 0x72, 0xc8200ec180)
        /usr/local/go/src/runtime/netpoll.go:160 +0x60
net.(*pollDesc).Wait(0xc82005a220, 0x72, 0x0, 0x0)
        /usr/local/go/src/net/fd_poll_runtime.go:73 +0x3a
net.(*pollDesc).WaitRead(0xc82005a220, 0x0, 0x0)
        /usr/local/go/src/net/fd_poll_runtime.go:78 +0x36
net.(*netFD).Read(0xc82005a1c0, 0xc8200ec180, 0x24, 0x24, 0x0, 0x7f67c2c06050, 0                                                                                                                                   xc82000a1b0)
        /usr/local/go/src/net/fd_unix.go:250 +0x23a
net.(*conn).Read(0xc8200f4000, 0xc8200ec180, 0x24, 0x24, 0x0, 0x0, 0x0)
        /usr/local/go/src/net/net.go:172 +0xe4
gopkg.in/mgo%2ev2.fill(0x7f67c2c0b608, 0xc8200f4000, 0xc8200ec180, 0x24, 0x24, 0                                                                                                                                   x0, 0x0)
        /home/stardew/src/gopkg.in/mgo.v2/socket.go:535 +0x6c
gopkg.in/mgo%2ev2.(*mongoSocket).readLoop(0xc8200f8000)
        /home/stardew/src/gopkg.in/mgo.v2/socket.go:551 +0x133
created by gopkg.in/mgo%2ev2.newSocket
        /home/stardew/src/gopkg.in/mgo.v2/socket.go:194 +0x407

goroutine 1327 [IO wait, 11546 minutes]:
net.runtime_pollWait(0x7f67c2c0b458, 0x72, 0xc8213dc420)
        /usr/local/go/src/runtime/netpoll.go:160 +0x60
net.(*pollDesc).Wait(0xc8203513a0, 0x72, 0x0, 0x0)
        /usr/local/go/src/net/fd_poll_runtime.go:73 +0x3a
net.(*pollDesc).WaitRead(0xc8203513a0, 0x0, 0x0)
        /usr/local/go/src/net/fd_poll_runtime.go:78 +0x36
net.(*netFD).Read(0xc820351340, 0xc8213dc420, 0x24, 0x24, 0x0, 0x7f67c2c06050, 0                                                                                                                                   xc82000a1b0)
        /usr/local/go/src/net/fd_unix.go:250 +0x23a
net.(*conn).Read(0xc82002e080, 0xc8213dc420, 0x24, 0x24, 0x0, 0x0, 0x0)
        /usr/local/go/src/net/net.go:172 +0xe4
gopkg.in/mgo%2ev2.fill(0x7f67c2c0b608, 0xc82002e080, 0xc8213dc420, 0x24, 0x24, 0                                                                                                                                   x0, 0x0)
        /home/stardew/src/gopkg.in/mgo.v2/socket.go:535 +0x6c
gopkg.in/mgo%2ev2.(*mongoSocket).readLoop(0xc8200f87e0)
        /home/stardew/src/gopkg.in/mgo.v2/socket.go:551 +0x133
created by gopkg.in/mgo%2ev2.newSocket
        /home/stardew/src/gopkg.in/mgo.v2/socket.go:194 +0x407

goroutine 14142329 [IO wait]:
net.runtime_pollWait(0x7f67c2c0b2d8, 0x72, 0xc8221e8630)
        /usr/local/go/src/runtime/netpoll.go:160 +0x60
net.(*pollDesc).Wait(0xc820350140, 0x72, 0x0, 0x0)
        /usr/local/go/src/net/fd_poll_runtime.go:73 +0x3a
net.(*pollDesc).WaitRead(0xc820350140, 0x0, 0x0)
        /usr/local/go/src/net/fd_poll_runtime.go:78 +0x36
net.(*netFD).Read(0xc8203500e0, 0xc8221e8630, 0x24, 0x24, 0x0, 0x7f67c2c06050, 0                                                                                                                                   xc82000a1b0)
        /usr/local/go/src/net/fd_unix.go:250 +0x23a
net.(*conn).Read(0xc8200f4058, 0xc8221e8630, 0x24, 0x24, 0x0, 0x0, 0x0)
        /usr/local/go/src/net/net.go:172 +0xe4
gopkg.in/mgo%2ev2.fill(0x7f67c2c0b608, 0xc8200f4058, 0xc8221e8630, 0x24, 0x24, 0                                                                                                                                   x0, 0x0)
        /home/stardew/src/gopkg.in/mgo.v2/socket.go:535 +0x6c
gopkg.in/mgo%2ev2.(*mongoSocket).readLoop(0xc820c7c380)
        /home/stardew/src/gopkg.in/mgo.v2/socket.go:551 +0x133
created by gopkg.in/mgo%2ev2.newSocket
        /home/stardew/src/gopkg.in/mgo.v2/socket.go:194 +0x407

rax    0xca
rbx    0x0
rcx    0x45f8d3
rdx    0x0
rdi    0xa39348
rsi    0x0
rbp    0x1
rsp    0x7ffd8f514ff0
r8     0x0
r9     0x0
r10    0x0
r11    0x286
r12    0xa
r13    0x8fa1de
r14    0xf
r15    0x8
rip    0x45f8d1
rflags 0x286
cs     0x33
fs     0x0
gs     0x0
2016/07/06 23:14:15  [*] Waiting for messages. To exit press CTRL+C
nictuku commented 8 years ago

My best guess is that the mgo connection is becoming stale so we need to refresh it if it becomes bad.

nictuku commented 8 years ago

Working on this now

nictuku commented 8 years ago

I didn't restart the database recently, so the bug didn't hit us recently.

But I went ahead and made a few partial fixes: 851b629c23f68f8533979a4e84da06eab5ea6e71 and d40f212a06319e6c40cec93719d05d0cd3802bba.

Those commits make the subscriber (and parts of the web server) to grab a new session when needed. Well, that's in theory. The mgo documentation isn't super clear about how and when will sessions be reconnected after the db is restarted.