ipfs / kubo

An IPFS implementation in Go
https://docs.ipfs.tech/how-to/command-line-quick-start/
Other
16.11k stars 3.01k forks source link

Providers memory leak #2750

Closed edt-xx closed 8 years ago

edt-xx commented 8 years ago

go-ipfs compiled from git on the 17th (package says 0.4.2.r2.gabeea4f-1) running 4.5 days with 98 connections (great) but is using 3110m, as reported by top, which seems a bit extreme.

16098 ed 20 0 3122.9m 846.7m 0.7 5.3 88:13.21 S ipfs

Ran the commands from: https://github.com/ipfs/go-ipfs/blob/master/debug-guide.md and packaged the files (called the heap dump ipfs.heap so as not to overwrite the ipfs.stack file.

Hope this helps Ed

ipfs.3g.tar.gz

edt-xx commented 8 years ago

About 23h later:

16098 ed 20 0 3420.7m 1.097g 2.6 7.0 118:41.41 S ipfs

ipfs_3g_23.tar.gz

hopefully serial debug profiles help. Included:

curl localhost:5001/debug/metrics/prometheus > ipfs.prom

Ed

Kubuxu commented 8 years ago

It looks like there is goroutine leak. There are over 600 active goroutines. Some of them are in IO Wait for almost 100 hours:

goroutine 5140122 [IO wait, 5692 minutes]:
net.runtime_pollWait(0x7f89df7315f8, 0x72, 0xc8252b424c)
    /usr/lib/go/src/runtime/netpoll.go:160 +0x60
net.(*pollDesc).Wait(0xc823812060, 0x72, 0x0, 0x0)
    /usr/lib/go/src/net/fd_poll_runtime.go:73 +0x3a
net.(*pollDesc).WaitRead(0xc823812060, 0x0, 0x0)
    /usr/lib/go/src/net/fd_poll_runtime.go:78 +0x36
net.(*netFD).Read(0xc823812000, 0xc8252b424c, 0x4, 0x4, 0x0, 0x7f89df77b050, 0xc820010220)
    /usr/lib/go/src/net/fd_unix.go:250 +0x23a
net.(*conn).Read(0xc82012b608, 0xc8252b424c, 0x4, 0x4, 0x0, 0x0, 0x0)
    /usr/lib/go/src/net/net.go:172 +0xe4
gx/ipfs/QmYp8PC6b9M3UY4awdHkdRUim68KpGSmRmz27bANHteen6/go-multiaddr-net.(*maConn).Read(0xc82efbc810, 0xc8252b424c, 0x4, 0x4, 0x41da31, 0x0, 0x0)
    <autogenerated>:3 +0x82
gx/ipfs/QmboVACz6WES3byh3q8jdYPWL9TMz7CT89V34WPtruuRoS/go-libp2p-transport.(*connWrap).Read(0xc82383cf00, 0xc8252b424c, 0x4, 0x4, 0x15f9590, 0x0, 0x0)
    <autogenerated>:24 +0x82
gx/ipfs/QmVL44QeoQDTYK8RVdpkyja7uYcK3WDNoBNHVLonf9YDtm/go-libp2p/p2p/metrics/conn.(*MeteredConn).Read(0xc82383d040, 0xc8252b424c, 0x4, 0x4, 0x0, 0x0, 0x0)
    /tmp/makepkg/go-ipfs-git/src/src/gx/ipfs/QmVL44QeoQDTYK8RVdpkyja7uYcK3WDNoBNHVLonf9YDtm/go-libp2p/p2p/metrics/conn/conn.go:28 +0x75
gx/ipfs/QmVL44QeoQDTYK8RVdpkyja7uYcK3WDNoBNHVLonf9YDtm/go-libp2p/p2p/net/conn.(*singleConn).Read(0xc82d5f2180, 0xc8252b424c, 0x4, 0x4, 0xc8252b43c0, 0x0, 0x0)
    /tmp/makepkg/go-ipfs-git/src/src/gx/ipfs/QmVL44QeoQDTYK8RVdpkyja7uYcK3WDNoBNHVLonf9YDtm/go-libp2p/p2p/net/conn/conn.go:123 +0x6b
io.ReadAtLeast(0x7f89df78ffb8, 0xc82d5f2180, 0xc8252b424c, 0x4, 0x4, 0x4, 0x0, 0x0, 0x0)
    /usr/lib/go/src/io/io.go:297 +0xe6
io.ReadFull(0x7f89df78ffb8, 0xc82d5f2180, 0xc8252b424c, 0x4, 0x4, 0x0, 0x0, 0x0)
    /usr/lib/go/src/io/io.go:315 +0x62
gx/ipfs/QmRQhVisS8dmPbjBUthVkenn81pBxrx1GxE281csJhm2vL/go-msgio.ReadLen(0x7f89df78ffb8, 0xc82d5f2180, 0xc8252b424c, 0x4, 0x4, 0xac5a61, 0x0, 0x0)
    /tmp/makepkg/go-ipfs-git/src/src/gx/ipfs/QmRQhVisS8dmPbjBUthVkenn81pBxrx1GxE281csJhm2vL/go-msgio/num.go:27 +0xb7
gx/ipfs/QmRQhVisS8dmPbjBUthVkenn81pBxrx1GxE281csJhm2vL/go-msgio.(*reader).nextMsgLen(0xc8237cc0a0, 0x7f89df790008, 0x0, 0x0)
    /tmp/makepkg/go-ipfs-git/src/src/gx/ipfs/QmRQhVisS8dmPbjBUthVkenn81pBxrx1GxE281csJhm2vL/go-msgio/msgio.go:163 +0x66
gx/ipfs/QmRQhVisS8dmPbjBUthVkenn81pBxrx1GxE281csJhm2vL/go-msgio.(*reader).ReadMsg(0xc8237cc0a0, 0x0, 0x0, 0x0, 0x0, 0x0)
    /tmp/makepkg/go-ipfs-git/src/src/gx/ipfs/QmRQhVisS8dmPbjBUthVkenn81pBxrx1GxE281csJhm2vL/go-msgio/msgio.go:195 +0xdb
gx/ipfs/QmRQhVisS8dmPbjBUthVkenn81pBxrx1GxE281csJhm2vL/go-msgio.(*readWriter).ReadMsg(0xc8251367a0, 0x0, 0x0, 0x0, 0x0, 0x0)
    <autogenerated>:17 +0x70
gx/ipfs/QmbuSYB51KtX8izgab1fSyBgsyMH2hTuCfBtXm77PeXWca/go-libp2p-secio.readMsgCtx.func1(0xc825136c60, 0x7f89df7903f0, 0xc8251367a0, 0xc82e3aa420, 0x7f89df78f9d8, 0xc82e3aa2a0)
    /tmp/makepkg/go-ipfs-git/src/src/gx/ipfs/QmbuSYB51KtX8izgab1fSyBgsyMH2hTuCfBtXm77PeXWca/go-libp2p-secio/rw.go:280 +0x48
created by gx/ipfs/QmbuSYB51KtX8izgab1fSyBgsyMH2hTuCfBtXm77PeXWca/go-libp2p-secio.readMsgCtx
    /tmp/makepkg/go-ipfs-git/src/src/gx/ipfs/QmbuSYB51KtX8izgab1fSyBgsyMH2hTuCfBtXm77PeXWca/go-libp2p-secio/rw.go:285 +0xe6

There are also lots of goroutines running in yamux.

Here is full goroutine stack dump from this report: https://gist.github.com/Kubuxu/da3c88d618fc534198249ad833ad79d9

Kubuxu commented 8 years ago

It weren't goroutines

Goroutines at the dump time position frequency list:

    159 net.runtime_pollWait(
    106 gx/ipfs/QmZ8MMKFwA95viWULoSYFZpA4kdFa8idmFSrP12YJwjjaL/yamux.(*Session).send(
    106 gx/ipfs/QmZ8MMKFwA95viWULoSYFZpA4kdFa8idmFSrP12YJwjjaL/yamux.(*Session).keepalive(
    106 gx/ipfs/QmZ8MMKFwA95viWULoSYFZpA4kdFa8idmFSrP12YJwjjaL/yamux.(*Session).AcceptStream(
    106 github.com/ipfs/go-ipfs/exchange/bitswap.(*msgQueue).runQueue(
      8 gx/ipfs/QmQopLATEYMNg7dVqZRNDfeE2S1yKy8zrRh5xnYiuqeZBn/goprocess/context.CloseAfterContext.func1(
      8 github.com/ipfs/go-ipfs/exchange/bitswap.(*Bitswap).taskWorker(
      6 sync.runtime_Semacquire(
      4 sync.runtime_Syncsemacquire(
      3 github.com/ipfs/go-ipfs/Godeps/_workspace/src/github.com/jbenet/go-context/io.(*ctxReader).Read(
      2 net/http.(*persistConn).writeLoop(
      2 main.merge.func1(
      2 gx/ipfs/QmZ8MMKFwA95viWULoSYFZpA4kdFa8idmFSrP12YJwjjaL/yamux.(*Stream).Read(
      2 gx/ipfs/QmVL44QeoQDTYK8RVdpkyja7uYcK3WDNoBNHVLonf9YDtm/go-libp2p/p2p/net/swarm.(*Swarm).addConnListener.func2(
      2 gx/ipfs/QmVL44QeoQDTYK8RVdpkyja7uYcK3WDNoBNHVLonf9YDtm/go-libp2p/p2p/net/conn.(*listener).Accept(
      2 gx/ipfs/QmQopLATEYMNg7dVqZRNDfeE2S1yKy8zrRh5xnYiuqeZBn/goprocess/periodic.callOnTicker.func1(
      2 github.com/ipfs/go-ipfs/core/corehttp.Serve(
      1 runtime/pprof.writeGoroutineStacks(
      1 runtime.gopark(
      1 runtime.goexit(
      1 os/signal.signal_recv(
      1 main.(*IntrHandler).Handle.func1(
      1 main.daemonFunc.func1(
      1 main.daemonFunc(
      1 gx/ipfs/QmVL44QeoQDTYK8RVdpkyja7uYcK3WDNoBNHVLonf9YDtm/go-libp2p/p2p/discovery.(*mdnsService).pollForEntries(
      1 gx/ipfs/QmQopLATEYMNg7dVqZRNDfeE2S1yKy8zrRh5xnYiuqeZBn/goprocess.(*process).CloseAfterChildren(
      1 gx/ipfs/QmQopLATEYMNg7dVqZRNDfeE2S1yKy8zrRh5xnYiuqeZBn/goprocess/periodic.Every.func1(
      1 gx/ipfs/QmeYJHEk8UjVVZ4XCRTZe6dFQrb8pGWD81LYCgeLp8CvMB/go-metrics.(*meterArbiter).tick(
      1 gx/ipfs/QmduCCgTaLnxwwf9RFQy2PMUytrKcEH9msohtVxSBZUdgu/go-peerstream.(*Swarm).connGarbageCollect(
      1 gx/ipfs/QmdhsRK1EK2fvAz2i2SH5DEfkL6seDuyMYEsxKa9Braim3/client_golang/prometheus.computeApproximateRequestSize(
      1 gx/ipfs/QmbyvM8zRFDkbFdYyt1MnevUMJ62SiSGbfDFZ3Z8nkrzr4/go-libp2p-peer/queue.(*ChanQueue).process.func1(
      1 gx/ipfs/Qmazh5oNUVsDZTs2g59rq8aYQqwpss8tcUWQzor5sCCEuH/go-log.(*MirrorWriter).logRoutine(
      1 gx/ipfs/QmaDNZ4QMdBdku1YZWBysufYyoQt1negQGNav6PLYarbY8/go-log.(*MirrorWriter).logRoutine(
      1 github.com/ipfs/go-ipfs/routing/dht.(*ProviderManager).run(
      1 github.com/ipfs/go-ipfs/routing/dht.(*IpfsDHT).Provide(
      1 github.com/ipfs/go-ipfs/routing/dht.(*IpfsDHT).Bootstrap.func1(
      1 github.com/ipfs/go-ipfs/routing/dht.(*dhtQueryRunner).spawnWorkers(
      1 github.com/ipfs/go-ipfs/routing/dht.(*dhtQueryRunner).Run(
      1 github.com/ipfs/go-ipfs/namesys/republisher.(*Republisher).Run(
      1 github.com/ipfs/go-ipfs/mfs.(*Republisher).Run(
      1 github.com/ipfs/go-ipfs/Godeps/_workspace/src/github.com/syndtr/goleveldb/leveldb/util.(*BufferPool).drain(
      1 github.com/ipfs/go-ipfs/Godeps/_workspace/src/github.com/syndtr/goleveldb/leveldb.(*DB).tCompaction(
      1 github.com/ipfs/go-ipfs/Godeps/_workspace/src/github.com/syndtr/goleveldb/leveldb.(*DB).mpoolDrain(
      1 github.com/ipfs/go-ipfs/Godeps/_workspace/src/github.com/syndtr/goleveldb/leveldb.(*DB).mCompaction(
      1 github.com/ipfs/go-ipfs/Godeps/_workspace/src/github.com/syndtr/goleveldb/leveldb.(*DB).jWriter(
      1 github.com/ipfs/go-ipfs/Godeps/_workspace/src/github.com/syndtr/goleveldb/leveldb.(*DB).compactionError(
      1 github.com/ipfs/go-ipfs/Godeps/_workspace/src/github.com/ipfs/go-datastore/query.ResultsWithChan.func1(
      1 github.com/ipfs/go-ipfs/Godeps/_workspace/src/github.com/ipfs/go-datastore/namespace.(*datastore).Query.func1(
      1 github.com/ipfs/go-ipfs/Godeps/_workspace/src/github.com/ipfs/go-datastore/keytransform.(*ktds).Query.func1(
      1 github.com/ipfs/go-ipfs/Godeps/_workspace/src/github.com/ipfs/go-datastore/flatfs.(*Datastore).Query.func1.1(
      1 github.com/ipfs/go-ipfs/Godeps/_workspace/src/github.com/codahale/metrics.init.1.func2(
      1 github.com/ipfs/go-ipfs/Godeps/_workspace/src/github.com/briantigerchow/pubsub.(*PubSub).start(
      1 github.com/ipfs/go-ipfs/exchange/bitswap.(*WantManager).Run(
      1 github.com/ipfs/go-ipfs/exchange/bitswap.New.func2(
      1 github.com/ipfs/go-ipfs/exchange/bitswap/decision.(*Engine).nextEnvelope(
      1 github.com/ipfs/go-ipfs/exchange/bitswap.(*Bitswap).rebroadcastWorker(
      1 github.com/ipfs/go-ipfs/exchange/bitswap.(*Bitswap).provideWorker(
      1 github.com/ipfs/go-ipfs/exchange/bitswap.(*Bitswap).providerQueryManager(
      1 github.com/ipfs/go-ipfs/exchange/bitswap.(*Bitswap).provideCollector(
      1 github.com/ipfs/go-ipfs/blocks/blockstore.(*blockstore).AllKeysChan.func2(

Generated with:

grep -e '^goroutine' ipfs.stacks -A1 | grep -v '^goroutine' | grep -v -- '--' | sort| sed 's/[^\(]*$//' | uniq -c | sort -bg | tac

Looks like the yamux issue isn't really fixed. Now it doesn't hang the daemon but keeps accumulating goroutines.

Kubuxu commented 8 years ago

Can you try using this ipfs version: https://ipfs.io/ipfs/QmUj8XdoQHfEe2MwBBnCP7HsXGFAWZHWiuLYA2ynBvZyoj/ipfs It includes newest patched version of yamux in libp2p which might not be leaking goroutines.

edt-xx commented 8 years ago

Okay trying the above binary.

Kubuxu commented 8 years ago

I was wrong. The yamux isn't leaking. The memory leak has to be somewhere lease IMO.

Looks like there are 3 yamux + 1 IO goroutine per connection.

Kubuxu commented 8 years ago

There is 400MB of data on heap. It is still too much but it is not 3G the htop was reporting. Are you sure that you looked on RES (resident memory) not virtual?

Kubuxu commented 8 years ago

A lot of data is provider dht data, full graph here

edt-xx commented 8 years ago

From the top line posted in the second snapshot the RESident memory was 1.097g.

whyrusleeping commented 8 years ago

Yeah, this looks like a "leak" on the providers manager. This is something i'm starting to work on fixing

whyrusleeping commented 8 years ago

This should be resolved now, #2860 has merged