ipfs / kubo

An IPFS implementation in Go
https://docs.ipfs.tech/how-to/command-line-quick-start/
Other
15.82k stars 2.96k forks source link

Daemon 0.4.3-rc2 crash #3045

Closed NeoTeo closed 7 years ago

NeoTeo commented 7 years ago

Version/Platform/Processor information (from ipfs version --all): go-ipfs version: 0.4.3-rc2- Repo version: 4 System version: amd64/darwin Golang version: go1.6.2

Type (bug, feature, meta, test failure, question): bug

Area (api, commands, daemon, fuse, etc): daemon

Priority (from P0: functioning, to P4: operations on fire): P3: frequent crashing

Description: I can reproduce the following crash by starting the daemon and then calling ipfs cat QmPZ9gcCEpqKTo6aq61g2nXGUhM4iCL3ewB6LDXZCtioEB

The daemon crashes out with the output included below. Subsequent runs of the daemon crash out immediately.

Crash log:

panic: sync: inconsistent mutex state

goroutine 58 [running]:
panic(0x8be940, 0xc822333c60)
    /home/whyrusleeping/go/src/runtime/panic.go:481 +0x3e6
sync.(*Mutex).Lock(0xc82012b96c)
    /home/whyrusleeping/go/src/sync/mutex.go:75 +0x16d
gx/ipfs/QmeYJHEk8UjVVZ4XCRTZe6dFQrb8pGWD81LYCgeLp8CvMB/go-metrics.(*StandardEWMA).Rate(0xc82012b950, 0x0)
    /builds/distributions/dists/go-ipfs/gopath/src/gx/ipfs/QmeYJHEk8UjVVZ4XCRTZe6dFQrb8pGWD81LYCgeLp8CvMB/go-metrics/ewma.go:94 +0x3a
gx/ipfs/QmeYJHEk8UjVVZ4XCRTZe6dFQrb8pGWD81LYCgeLp8CvMB/go-metrics.(*StandardMeter).updateSnapshot(0xc8201c5a80)
    /builds/distributions/dists/go-ipfs/gopath/src/gx/ipfs/QmeYJHEk8UjVVZ4XCRTZe6dFQrb8pGWD81LYCgeLp8CvMB/go-metrics/meter.go:216 +0xf2
gx/ipfs/QmeYJHEk8UjVVZ4XCRTZe6dFQrb8pGWD81LYCgeLp8CvMB/go-metrics.(*StandardMeter).Mark(0xc8201c5a80, 0x22)
    /builds/distributions/dists/go-ipfs/gopath/src/gx/ipfs/QmeYJHEk8UjVVZ4XCRTZe6dFQrb8pGWD81LYCgeLp8CvMB/go-metrics/meter.go:159 +0x144
gx/ipfs/QmVCe3SNMjkcPgnpFhZs719dheq6xE7gJwjzV7aWcUM4Ms/go-libp2p/p2p/metrics.(*BandwidthCounter).LogRecvMessage(0xc8201918c0, 0x22)
    /builds/distributions/dists/go-ipfs/gopath/src/gx/ipfs/QmVCe3SNMjkcPgnpFhZs719dheq6xE7gJwjzV7aWcUM4Ms/go-libp2p/p2p/metrics/bw_stats.go:39 +0x44
gx/ipfs/QmVCe3SNMjkcPgnpFhZs719dheq6xE7gJwjzV7aWcUM4Ms/go-libp2p/p2p/metrics.(Reporter).LogRecvMessage-fm(0x22)
    /builds/distributions/dists/go-ipfs/gopath/src/gx/ipfs/QmVCe3SNMjkcPgnpFhZs719dheq6xE7gJwjzV7aWcUM4Ms/go-libp2p/p2p/metrics/conn/conn.go:16 +0x3c
gx/ipfs/QmVCe3SNMjkcPgnpFhZs719dheq6xE7gJwjzV7aWcUM4Ms/go-libp2p/p2p/metrics/conn.(*MeteredConn).Read(0xc821c9de20, 0xc821e58000, 0x22, 0x1000, 0x0, 0x0, 0x0)
    /builds/distributions/dists/go-ipfs/gopath/src/gx/ipfs/QmVCe3SNMjkcPgnpFhZs719dheq6xE7gJwjzV7aWcUM4Ms/go-libp2p/p2p/metrics/conn/conn.go:30 +0xa5
gx/ipfs/QmVCe3SNMjkcPgnpFhZs719dheq6xE7gJwjzV7aWcUM4Ms/go-libp2p/p2p/net/conn.(*singleConn).Read(0xc821cced80, 0xc821e58000, 0x22, 0x1000, 0x4, 0x0, 0x0)
    /builds/distributions/dists/go-ipfs/gopath/src/gx/ipfs/QmVCe3SNMjkcPgnpFhZs719dheq6xE7gJwjzV7aWcUM4Ms/go-libp2p/p2p/net/conn/conn.go:123 +0x6b
io.ReadAtLeast(0x14e6818, 0xc821cced80, 0xc821e58000, 0x22, 0x1000, 0x22, 0x0, 0x0, 0x0)
    /home/whyrusleeping/go/src/io/io.go:297 +0xe6
io.ReadFull(0x14e6818, 0xc821cced80, 0xc821e58000, 0x22, 0x1000, 0x14eadb8, 0x0, 0x0)
    /home/whyrusleeping/go/src/io/io.go:315 +0x62
gx/ipfs/QmRQhVisS8dmPbjBUthVkenn81pBxrx1GxE281csJhm2vL/go-msgio.(*reader).Read(0xc821e4c230, 0xc821e58000, 0x22, 0x1000, 0x0, 0x0, 0x0)
    /builds/distributions/dists/go-ipfs/gopath/src/gx/ipfs/QmRQhVisS8dmPbjBUthVkenn81pBxrx1GxE281csJhm2vL/go-msgio/msgio.go:186 +0x1af
io.ReadAtLeast(0x14eadb8, 0xc821e4c230, 0xc821e58000, 0x22, 0x1000, 0x22, 0x0, 0x0, 0x0)
    /home/whyrusleeping/go/src/io/io.go:297 +0xe6
io.ReadFull(0x14eadb8, 0xc821e4c230, 0xc821e58000, 0x22, 0x1000, 0x165ab8, 0x0, 0x0)
    /home/whyrusleeping/go/src/io/io.go:315 +0x62
gx/ipfs/QmVjz1uf6U3sVQ5DbWWj7ktTtDd4GgsptYc7FBp33nWE53/go-libp2p-secio.(*etmReader).Read(0xc82029f200, 0xc821e58000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
    /builds/distributions/dists/go-ipfs/gopath/src/gx/ipfs/QmVjz1uf6U3sVQ5DbWWj7ktTtDd4GgsptYc7FBp33nWE53/go-libp2p-secio/rw.go:182 +0x2e0
gx/ipfs/QmRQhVisS8dmPbjBUthVkenn81pBxrx1GxE281csJhm2vL/go-msgio.(*readWriter).Read(0xc821e54180, 0xc821e58000, 0x1000, 0x1000, 0x40, 0x0, 0x0)
    <autogenerated>:16 +0x80
gx/ipfs/QmVCe3SNMjkcPgnpFhZs719dheq6xE7gJwjzV7aWcUM4Ms/go-libp2p/p2p/net/conn.(*secureConn).Read(0xc821cbdd00, 0xc821e58000, 0x1000, 0x1000, 0x40, 0x0, 0x0)
    /builds/distributions/dists/go-ipfs/gopath/src/gx/ipfs/QmVCe3SNMjkcPgnpFhZs719dheq6xE7gJwjzV7aWcUM4Ms/go-libp2p/p2p/net/conn/secure_conn.go:113 +0x90
bufio.(*Reader).fill(0xc8200d14a0)
    /home/whyrusleeping/go/src/bufio/bufio.go:97 +0x1e9
bufio.(*Reader).Read(0xc8200d14a0, 0xc821e7a000, 0x2, 0x3e20, 0xf60c07643132818a, 0x0, 0x0)
    /home/whyrusleeping/go/src/bufio/bufio.go:207 +0x260
io.(*LimitedReader).Read(0xc8223b4720, 0xc821e7a000, 0x2, 0x3e20, 0x5733a1, 0x0, 0x0)
    /home/whyrusleeping/go/src/io/io.go:426 +0xbd
bytes.(*Buffer).ReadFrom(0xc821cc88c0, 0x16805b0, 0xc8223b4720, 0x0, 0x0, 0x0)
    /home/whyrusleeping/go/src/bytes/buffer.go:176 +0x23f
io.copyBuffer(0x14a12a8, 0xc821cc88c0, 0x16805b0, 0xc8223b4720, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
    /home/whyrusleeping/go/src/io/io.go:374 +0x180
io.Copy(0x14a12a8, 0xc821cc88c0, 0x16805b0, 0xc8223b4720, 0xc821e58000, 0x0, 0x0)
    /home/whyrusleeping/go/src/io/io.go:350 +0x64
gx/ipfs/QmZ8MMKFwA95viWULoSYFZpA4kdFa8idmFSrP12YJwjjaL/yamux.(*Stream).readData(0xc821e00c30, 0xc821cd8ca0, 0xc, 0xc, 0xc821e50000, 0x16805b0, 0xc8223b4720, 0x0, 0x0)
    /builds/distributions/dists/go-ipfs/gopath/src/gx/ipfs/QmZ8MMKFwA95viWULoSYFZpA4kdFa8idmFSrP12YJwjjaL/yamux/stream.go:410 +0x5de
gx/ipfs/QmZ8MMKFwA95viWULoSYFZpA4kdFa8idmFSrP12YJwjjaL/yamux.(*Session).handleStreamMessage(0xc821e6e000, 0xc821cd8ca0, 0xc, 0xc, 0x0, 0x0)
    /builds/distributions/dists/go-ipfs/gopath/src/gx/ipfs/QmZ8MMKFwA95viWULoSYFZpA4kdFa8idmFSrP12YJwjjaL/yamux/session.go:488 +0xb5e
gx/ipfs/QmZ8MMKFwA95viWULoSYFZpA4kdFa8idmFSrP12YJwjjaL/yamux.(*Session).(gx/ipfs/QmZ8MMKFwA95viWULoSYFZpA4kdFa8idmFSrP12YJwjjaL/yamux.handleStreamMessage)-fm(0xc821cd8ca0, 0xc, 0xc, 0x0, 0x0)
    /builds/distributions/dists/go-ipfs/gopath/src/gx/ipfs/QmZ8MMKFwA95viWULoSYFZpA4kdFa8idmFSrP12YJwjjaL/yamux/session.go:427 +0x4c
gx/ipfs/QmZ8MMKFwA95viWULoSYFZpA4kdFa8idmFSrP12YJwjjaL/yamux.(*Session).recvLoop(0xc821e6e000, 0x0, 0x0)
    /builds/distributions/dists/go-ipfs/gopath/src/gx/ipfs/QmZ8MMKFwA95viWULoSYFZpA4kdFa8idmFSrP12YJwjjaL/yamux/session.go:439 +0x5a8
gx/ipfs/QmZ8MMKFwA95viWULoSYFZpA4kdFa8idmFSrP12YJwjjaL/yamux.(*Session).recv(0xc821e6e000)
    /builds/distributions/dists/go-ipfs/gopath/src/gx/ipfs/QmZ8MMKFwA95viWULoSYFZpA4kdFa8idmFSrP12YJwjjaL/yamux/session.go:399 +0x21
created by gx/ipfs/QmZ8MMKFwA95viWULoSYFZpA4kdFa8idmFSrP12YJwjjaL/yamux.newSession
    /builds/distributions/dists/go-ipfs/gopath/src/gx/ipfs/QmZ8MMKFwA95viWULoSYFZpA4kdFa8idmFSrP12YJwjjaL/yamux/session.go:104 +0x4b4
whyrusleeping commented 7 years ago

@NeoTeo this happens every time you restart the daemon?

Did you build from source? Or use a downloaded binary from dists?

NeoTeo commented 7 years ago

Hey @whyrusleeping :) It happens every time I restart. The binary is downloaded from dists.

whyrusleeping commented 7 years ago

@NeoTeo what version of OSX are you running? I can't reproduce this issue...

Did you start with a new repo, or is this an older repo that has a bunch of stuff in it still?

NeoTeo commented 7 years ago

It first happened on El Capitan 10.11.6 (15G31) so I tried it on a different machine which runs Sierra 10.12 beta (16A270f). On the Sierra I also deleted the .ipfs and did an ipfs init. Just tried the El Cap version (which has an existing v4 repo) again and now it doesn't crash but alternates between hanging and returning Error: merkledag: not found. I now worry it's just my machine having a fit, but it worked fine on 0.4.2 :/

whyrusleeping commented 7 years ago

@NeoTeo alright, i'll look into it more. Do you have a repeatable way of reproducing the issue? Or is it just more or less random?

NeoTeo commented 7 years ago

On Sierra the crash is repeatable, on El Cap the alternating behaviour is consistent (and never actually cats the hash). I have just tried doing a http://localhost:8080/ipfs/QmPZ9gcCEpqKTo6aq61g2nXGUhM4iCL3ewB6LDXZCtioEB from Safari on the El Cap machine and it comes up immediately so the node's able to get the hash ok.

NeoTeo commented 7 years ago

Let me know if you need me to try something.

whyrusleeping commented 7 years ago

@NeoTeo This is really bizarre, it feels like a bug in go to me.

whyrusleeping commented 7 years ago

@NeoTeo If its not too much trouble, on the system where the crash is reproducible, could you try installing the latest release candidate of go (1.7) and try building ipfs from source? I'm thinking that this might be an issue caused by changes in OSX syscalls

whyrusleeping commented 7 years ago

https://github.com/golang/go/issues/16272#issuecomment-231363289

NeoTeo commented 7 years ago

ok gonna try that.

NeoTeo commented 7 years ago

Good hunch! Not crashing on Sierra now. Gonna try on El Cap to see if it solves the hanging issue there as well.

whyrusleeping commented 7 years ago

If that fixes it, then i'll just have to make sure the release binaries for 0.4.3 get built with go1.7 (which i was hoping to do anyways)

NeoTeo commented 7 years ago

Alas, El Cap still gives the error: merkledag: not found.

NeoTeo commented 7 years ago

The Sierra binary wasn't rebuilt. Upgrading go was enough.

NeoTeo commented 7 years ago

Aaand, it crashed after successfully cat'ing the file. I'll try a build from source.

NeoTeo commented 7 years ago

Trouble with "make install". Overrode check_go_version to accept "1.7rc5 " version number but gx seems to not like it either as its strconv.ParseInt doesn't like the rc part...

whyrusleeping commented 7 years ago

@NeoTeo just do:

go get -u github.com/whyrusleeping/gx
go get -u github.com/whyrusleeping/gx-go
gx install --global
go install ./cmd/ipfs

I need to fix the go version parsing code there, its been biting me too

NeoTeo commented 7 years ago

Hm...odd. The first time it gave me this:

teo@Vger-6 ~> ipfs cat QmPZ9gcCEpqKTo6aq61g2nXGUhM4iCL3ewB6LDXZCtioEB
fish: 'ipfs cat QmPZ9gcCEpqKTo6aq61g2n…' terminated by signal SIGSEGV (Address boundary error)

Ran it again straight after and it worked just fine. The node's been rock solid since first run. And any subsequent cat'ing works as expected \o/

Is it worth trying this on the El Cap version too?

NeoTeo commented 7 years ago

Just built 0.4.3-rc2 from source on El Cap with go 1.7rc5 but unfortunately it still hangs on cat. Different problem? At least the Sierra problem seems fixed :)

NeoTeo commented 7 years ago

Hm, spoke a bit too soon. An ipfs add of a tiny text file causes the following crash:

https://gist.github.com/NeoTeo/fbaf0357ef5cf7ae1576f928d438d3e0

whyrusleeping commented 7 years ago

Anytime you see 'unexpected fault address', it's 99% of the time a go compiler bug.

You've rebuilt go-ipfs with go1.7-rc5 (theres an rc6 now with a bunch of OSX sierra changes) and have restarted the daemon?

NeoTeo commented 7 years ago

Yep. I'm updating Sierra right now to b5, so I'll try go1.7-rc6 just after.

NeoTeo commented 7 years ago

Looking good so far. Using rc6 the daemon has survived both an add and a cat.

NeoTeo commented 7 years ago

Same setup on El Capitan still gives Error: merkledag: not found /me gently sobs

whyrusleeping commented 7 years ago

@NeoTeo on el capitan can you try with a freshly inited repo? Thats very odd that its still failing

NeoTeo commented 7 years ago

I have run an ipfs repo verify (there's a lot there I'd be sad to lose). Can I safely squirrel away the .ipfs directory, init and then hope to somehow restore the repo after?

whyrusleeping commented 7 years ago

@NeoTeo yeah, you can just move the directory somewhere safe and move it back once we figure out the problem

NeoTeo commented 7 years ago

:/ Seems it was the repo somehow. Weird it'd pass the repo verify though.

whyrusleeping commented 7 years ago

@NeoTeo thats no good! Is it just catting the one object that fails? Or do other things break?

Can you add a new file and cat it?

NeoTeo commented 7 years ago

@whyrusleeping with the fresh init it adds and cats as it should. Moving the old repo back it hangs (or returns with the merkledag error) when cat'ing anything.

whyrusleeping commented 7 years ago

@NeoTeo on the old repo, when it hangs, can you give me the output of pressing Ctrl + \ ?

NeoTeo commented 7 years ago

@whyrusleeping

^\SIGQUIT: quit PC=0x6cc43 m=2

goroutine 0 [idle]: runtime.mach_semaphore_timedwait(0x3c00002303, 0x0, 0x700000080cf4, 0x3c, 0x0, 0xc4200009c0, 0x700000080d28, 0x62cc3, 0xdf8475800, 0x0, ...) /usr/local/go/src/runtime/sys_darwin_amd64.s:428 +0x13 runtime.semasleep1(0xdf8475800, 0x0) /usr/local/go/src/runtime/os_darwin.go:424 +0xda runtime.semasleep.func1() /usr/local/go/src/runtime/os_darwin.go:451 +0x33 runtime.systemstack(0x700000080d50) /usr/local/go/src/runtime/asm_amd64.s:314 +0xab runtime.semasleep(0xdf8475800, 0x0) /usr/local/go/src/runtime/os_darwin.go:452 +0x44 runtime.notetsleep_internal(0xdbf598, 0xdf8475800, 0xc4200009c0, 0x146943a38ecb61c7, 0xc4200009c0) /usr/local/go/src/runtime/lock_sema.go:198 +0x92 runtime.notetsleep(0xdbf598, 0xdf8475800, 0x146943946c6fea01) /usr/local/go/src/runtime/lock_sema.go:246 +0x75 runtime.sysmon() /usr/local/go/src/runtime/proc.go:3618 +0x14c runtime.mstart1() /usr/local/go/src/runtime/proc.go:1126 +0x11e runtime.mstart() /usr/local/go/src/runtime/proc.go:1096 +0x64

goroutine 1 [select]: net/http.(_persistConn).roundTrip(0xc420073900, 0xc4202931c0, 0x0, 0x0, 0x0) /usr/local/go/src/net/http/transport.go:1835 +0x93b net/http.(_Transport).RoundTrip(0xc4200160f0, 0xc4201542d0, 0xc4200160f0, 0x0, 0xc400000000) /usr/local/go/src/net/http/transport.go:380 +0x4ee net/http.send(0xc4201542d0, 0xcdb120, 0xc4200160f0, 0x0, 0x0, 0x0, 0x8, 0xc4201659e0, 0xc420128150) /usr/local/go/src/net/http/client.go:256 +0x15f net/http.(_Client).send(0xdbe4c0, 0xc4201542d0, 0x0, 0x0, 0x0, 0xc420128150, 0x0, 0x1) /usr/local/go/src/net/http/client.go:146 +0x102 net/http.(_Client).doFollowingRedirects(0xdbe4c0, 0xc4201542d0, 0x93fb20, 0x4, 0xfb601, 0xc420273f80) /usr/local/go/src/net/http/client.go:528 +0x5e5 net/http.(_Client).Do(0xdbe4c0, 0xc4201542d0, 0xc42026f740, 0x8c9e6d, 0x13) /usr/local/go/src/net/http/client.go:187 +0x11d github.com/ipfs/go-ipfs/commands/http.(_client).Send(0xc420293080, 0xcebc60, 0xc42011e180, 0x1, 0xc420233f40, 0xdbf3c0, 0x0) /Users/teo/source/go/src/github.com/ipfs/go-ipfs/commands/http/client.go:100 +0x639 main.callCommand(0xce5be0, 0xc420019000, 0xcebc60, 0xc42011e180, 0xdbf3c0, 0xd9c520, 0x0, 0x0, 0xc420165e60, 0x8206) /Users/teo/source/go/src/github.com/ipfs/go-ipfs/cmd/ipfs/main.go:329 +0x666 main.(*cmdInvocation).Run(0xc420018f40, 0xce5be0, 0xc420019000, 0xcdac60, 0xc4200d8e00, 0xce5be0, 0xc420019000) /Users/teo/source/go/src/github.com/ipfs/go-ipfs/cmd/ipfs/main.go:189 +0x116 main.main() /Users/teo/source/go/src/github.com/ipfs/go-ipfs/cmd/ipfs/main.go:154 +0x349

goroutine 17 [syscall, locked to thread]: runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:2086 +0x1

goroutine 18 [chan receive]: gx/ipfs/QmV3NSS3A1kX5s28r7yLczhDsXzkgo65cqRgKFXYunWZmD/metrics.init.1.func2() /Users/teo/source/go/src/gx/ipfs/QmV3NSS3A1kX5s28r7yLczhDsXzkgo65cqRgKFXYunWZmD/metrics/metrics.go:321 +0x88 created by gx/ipfs/QmV3NSS3A1kX5s28r7yLczhDsXzkgo65cqRgKFXYunWZmD/metrics.init.1 /Users/teo/source/go/src/gx/ipfs/QmV3NSS3A1kX5s28r7yLczhDsXzkgo65cqRgKFXYunWZmD/metrics/metrics.go:328 +0x66

goroutine 19 [syscall]: os/signal.signal_recv(0x0) /usr/local/go/src/runtime/sigqueue.go:116 +0x157 os/signal.loop() /usr/local/go/src/os/signal/signal_unix.go:22 +0x22 created by os/signal.init.1 /usr/local/go/src/os/signal/signal_unix.go:28 +0x41

goroutine 20 [select]: gx/ipfs/QmNQynaz7qfriSUJkiEZUrm2Wen1u3Kj9goZzWtrPyu7XR/go-log.(*MirrorWriter).logRoutine(0xc42012c340) /Users/teo/source/go/src/gx/ipfs/QmNQynaz7qfriSUJkiEZUrm2Wen1u3Kj9goZzWtrPyu7XR/go-log/writer.go:71 +0x32e created by gx/ipfs/QmNQynaz7qfriSUJkiEZUrm2Wen1u3Kj9goZzWtrPyu7XR/go-log.NewMirrorWriter /Users/teo/source/go/src/gx/ipfs/QmNQynaz7qfriSUJkiEZUrm2Wen1u3Kj9goZzWtrPyu7XR/go-log/writer.go:38 +0xf1

goroutine 10 [select, locked to thread]: runtime.gopark(0x93ffe0, 0x0, 0x8bce81, 0x6, 0x18, 0x2) /usr/local/go/src/runtime/proc.go:259 +0x13a runtime.selectgoImpl(0xc42002b730, 0x0, 0x18) /usr/local/go/src/runtime/select.go:423 +0x11d9 runtime.selectgo(0xc42002b730) /usr/local/go/src/runtime/select.go:238 +0x1c runtime.ensureSigM.func1() /usr/local/go/src/runtime/signal1_unix.go:304 +0x2d1 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:2086 +0x1

goroutine 23 [chan receive]: main.(_IntrHandler).Handle.func1(0xc4200d8e00, 0xc4200138a0) /Users/teo/source/go/src/github.com/ipfs/go-ipfs/cmd/ipfs/main.go:542 +0x96 created by main.(_IntrHandler).Handle /Users/teo/source/go/src/github.com/ipfs/go-ipfs/cmd/ipfs/main.go:547 +0x97

goroutine 27 [IO wait]: net.runtime_pollWait(0x1369488, 0x72, 0x6) /usr/local/go/src/runtime/netpoll.go:160 +0x59 net.(_pollDesc).wait(0xc42027b720, 0x72, 0xc4203629d0, 0xc4200121d0) /usr/local/go/src/net/fd_poll_runtime.go:73 +0x38 net.(_pollDesc).waitRead(0xc42027b720, 0xcdd9a0, 0xc4200121d0) /usr/local/go/src/net/fd_poll_runtime.go:78 +0x34 net.(_netFD).Read(0xc42027b6c0, 0xc42017f000, 0x1000, 0x1000, 0x0, 0xcdd9a0, 0xc4200121d0) /usr/local/go/src/net/fd_unix.go:243 +0x1a1 net.(_conn).Read(0xc420128158, 0xc42017f000, 0x1000, 0x1000, 0x0, 0x0, 0x0) /usr/local/go/src/net/net.go:173 +0x70 net/http.(_persistConn).Read(0xc420073900, 0xc42017f000, 0x1000, 0x1000, 0x11d1a3, 0xc42002d378, 0xc42002d388) /usr/local/go/src/net/http/transport.go:1256 +0x154 bufio.(_Reader).fill(0xc420131c80) /usr/local/go/src/bufio/bufio.go:97 +0x10c bufio.(_Reader).Peek(0xc420131c80, 0x1, 0xc420340060, 0xc42002d460, 0xc42002d3c8, 0x1efd8, 0x50) /usr/local/go/src/bufio/bufio.go:129 +0x62 net/http.(_persistConn).readLoop(0xc420073900) /usr/local/go/src/net/http/transport.go:1413 +0x1a1 created by net/http.(*Transport).dialConn /usr/local/go/src/net/http/transport.go:1057 +0x4e9

goroutine 28 [select]: net/http.(_persistConn).writeLoop(0xc420073900) /usr/local/go/src/net/http/transport.go:1641 +0x3bd created by net/http.(_Transport).dialConn /usr/local/go/src/net/http/transport.go:1058 +0x50e

rax 0xe rbx 0x0 rcx 0x700000080cc8 rdx 0x0 rdi 0x2303 rsi 0x3c rbp 0x700000080d00 rsp 0x700000080cc8 r8 0xffffffffffffffff r9 0x34 r10 0x75731cbc r11 0x202 r12 0x1443f2c804aaa r13 0x144c2f4554db0 r14 0x1469439556e1bc00 r15 0x711b10 rip 0x6cc43 rflags 0x202 cs 0x7 fs 0x0 gs 0x0

whyrusleeping commented 7 years ago

@NeoTeo could I get the output from the daemon while that command is hanging? (press Ctrl + \ on the daemon while the client command is hung)

NeoTeo commented 7 years ago

@whyrusleeping That's what that was. I did a ipfs add acme.dump then hit ctrl + \

NeoTeo commented 7 years ago

No, wait. Maybe I misunderstood. On the daemon. Coming up...

NeoTeo commented 7 years ago

@whyrusleeping Wow, that was a big one:

https://gist.github.com/NeoTeo/752acc221e42889c3c3a8a9fc58b1585

whyrusleeping commented 7 years ago

@NeoTeo that is really strange... can you try deleting the following from line 45 of core/commands/cat.go and recompiling?

        if err := corerepo.ConditionalGC(req.Context(), node, length); err != nil {
            res.SetError(err, cmds.ErrNormal)
            return
        }

For some reason your stack is telling me its hanging while running a garbage collection... but not actually getting to the GC part, just trying to take the lock?

NeoTeo commented 7 years ago

Yep, that works. But presumably you put that code there for a reason... some other part locking the GC?

NeoTeo commented 7 years ago

...and, since a freshly inited repo doesn't hang, somehow related to dealing with a repo that has been migrated from the the pre 0.4 days and yet passes the repo verify.

whyrusleeping commented 7 years ago

@NeoTeo how large is this repo? (the one that youre having troubles with)

I think this might be related to another issue where checking if we need a GC is too expensive

NeoTeo commented 7 years ago

@whyrusleeping ~12 GB

whyrusleeping commented 7 years ago

@NeoTeo in your .ipfs config, can you change Datastore.StorageMax to something like 1000GB or something? and then try catting something with the normal binary (not with the removed GC code)

NeoTeo commented 7 years ago

Yep, that's it! So a warning or a friendly fail message to to tell you you've exceeded your max would do it. Nice investigative work though @whyrusleeping 👍 :)

Kubuxu commented 7 years ago

@whyrusleeping it might be the same issue. I had the default GC setting in my config, we might want to be more explicit about running GC because size limit was hit.

also the softGC shouldn't lock if there is real GC running.

whyrusleeping commented 7 years ago

We should not have that 'feature' enabled by default. If you want to set a size limit, you should have to do it manually

whyrusleeping commented 7 years ago

@NeoTeo gonna go ahead and close this as i think we solved all the problems :)

Please open a new issue if anything else comes up