ipfs / boxo

A set of reference libraries for building IPFS applications and implementations in Go.
https://github.com/ipfs/boxo#readme
Other
186 stars 88 forks source link

bitswap/client: memory leak on BlockPresenceManager #574

Open Dreamacro opened 6 months ago

Dreamacro commented 6 months ago

Recently, I observed that when using bitswap for fetching data, there is a large amount of memory consumption and no release after downloading.

I use pprof and find most heap is occupied here. https://github.com/ipfs/boxo/blob/89bceff34bf108a46795d26938e6aa8c2f876fc4/bitswap/client/internal/blockpresencemanager/blockpresencemanager.go#L40

I try to print stat after bpm.presence[c] = make(map[peer.ID]bool) and I found bpm.presence would delete on BlockPresenceManager.RemoveKeys, so I added some logs.

I found that there is a race condition problem here.

  1. BlockPresenceManager.RemoveKeys has been called and remove the CidA
  2. BlockPresenceManager.ReceiveFrom has been called and created bpm.presence[CidA]
  3. bpm.presence[CidA] is no longer used or removed, and the map is getting bigger and bigger.
welcome[bot] commented 6 months ago

Thank you for submitting your first issue to this repository! A maintainer will be here shortly to triage and review. In the meantime, please double-check that you have provided all the necessary information to make this process easy! Any information that can help save additional round trips is useful! We currently aim to give initial feedback within two business days. If this does not happen, feel free to leave a comment. Please keep an eye on how this issue will be labeled, as labels give an overview of priorities, assignments and additional actions requested by the maintainers:

Finally, remember to use https://discuss.ipfs.io if you just need general support.

aschmahmann commented 6 months ago

@Dreamacro can you attach or link to a pprof dump for analysis? Also, which versions of boxo and go-libp2p are you using?

Dreamacro commented 6 months ago

@aschmahmann

I first provide a few fragments of pprof/heap. I can confirm a memory leak here because the leakage disappears when I replace map[cid.Cid]map[peer.ID]bool with a cache with time. https://github.com/Dreamacro/boxo/commit/df0a587788c0e8ea70d22d051b11dae8580ad629

heap fragments: (Although boxo 0.15 is shown below, it has been upgraded to boxo 0.17)

258: 1387008 [2230: 11988480] @ 0xcd2266 0xcd3d10 0xcd7025 0x1c257ec 0x1c257f0 0x1c549fc 0x1c58ade 0x1c58ffa 0x1c6df45 0x1c39d31 0x1b0a0df 0x1b09108 0x1b32687 0xd32fe1
#   0x1c257eb   github.com/ipfs/boxo/bitswap/client/internal/blockpresencemanager.(*BlockPresenceManager).updateBlockPresence+0x22b github.com/ipfs/boxo@v0.15.0/bitswap/client/internal/blockpresencemanager/blockpresencemanager.go:48
#   0x1c257ef   github.com/ipfs/boxo/bitswap/client/internal/blockpresencemanager.(*BlockPresenceManager).ReceiveFrom+0x22f     github.com/ipfs/boxo@v0.15.0/bitswap/client/internal/blockpresencemanager/blockpresencemanager.go:30
#   0x1c549fb   github.com/ipfs/boxo/bitswap/client/internal/sessionmanager.(*SessionManager).ReceiveFrom+0x9b              github.com/ipfs/boxo@v0.15.0/bitswap/client/internal/sessionmanager/sessionmanager.go:158
#   0x1c58add   github.com/ipfs/boxo/bitswap/client.(*Client).receiveBlocksFrom+0x5fd                           github.com/ipfs/boxo@v0.15.0/bitswap/client/client.go:336
#   0x1c58ff9   github.com/ipfs/boxo/bitswap/client.(*Client).ReceiveMessage+0x279                          github.com/ipfs/boxo@v0.15.0/bitswap/client/client.go:380
#   0x1c6df44   github.com/ipfs/boxo/bitswap.(*Bitswap).ReceiveMessage+0x84                             github.com/ipfs/boxo@v0.15.0/bitswap/bitswap.go:181
#   0x1c39d30   github.com/ipfs/boxo/bitswap/network.(*impl).handleNewStream+0x530                          github.com/ipfs/boxo@v0.15.0/bitswap/network/ipfs_impl.go:431
#   0x1b0a0de   github.com/libp2p/go-libp2p/p2p/host/basic.(*BasicHost).SetStreamHandler.func1+0x3e                 github.com/libp2p/go-libp2p@v0.32.1/p2p/host/basic/basic_host.go:598
#   0x1b09107   github.com/libp2p/go-libp2p/p2p/host/basic.(*BasicHost).newStreamHandler+0x6a7                      github.com/libp2p/go-libp2p@v0.32.1/p2p/host/basic/basic_host.go:440
#   0x1b32686   github.com/libp2p/go-libp2p/p2p/net/swarm.(*Conn).start.func1.1+0xa6                            github.com/libp2p/go-libp2p@v0.32.1/p2p/net/swarm/swarm_conn.go:137

109: 1185920 [3482: 37884160] @ 0xcd2266 0xcd3d10 0xcd7025 0x1c257ec 0x1c257f0 0x1c549fc 0x1c58ade 0x1c58ffa 0x1c6df45 0x1c39d31 0x1b0a0df 0x1b09108 0x1b32687 0xd32fe1
#   0x1c257eb   github.com/ipfs/boxo/bitswap/client/internal/blockpresencemanager.(*BlockPresenceManager).updateBlockPresence+0x22b github.com/ipfs/boxo@v0.15.0/bitswap/client/internal/blockpresencemanager/blockpresencemanager.go:48
#   0x1c257ef   github.com/ipfs/boxo/bitswap/client/internal/blockpresencemanager.(*BlockPresenceManager).ReceiveFrom+0x22f     github.com/ipfs/boxo@v0.15.0/bitswap/client/internal/blockpresencemanager/blockpresencemanager.go:30
#   0x1c549fb   github.com/ipfs/boxo/bitswap/client/internal/sessionmanager.(*SessionManager).ReceiveFrom+0x9b              github.com/ipfs/boxo@v0.15.0/bitswap/client/internal/sessionmanager/sessionmanager.go:158
#   0x1c58add   github.com/ipfs/boxo/bitswap/client.(*Client).receiveBlocksFrom+0x5fd                           github.com/ipfs/boxo@v0.15.0/bitswap/client/client.go:336
#   0x1c58ff9   github.com/ipfs/boxo/bitswap/client.(*Client).ReceiveMessage+0x279                          github.com/ipfs/boxo@v0.15.0/bitswap/client/client.go:380
#   0x1c6df44   github.com/ipfs/boxo/bitswap.(*Bitswap).ReceiveMessage+0x84                             github.com/ipfs/boxo@v0.15.0/bitswap/bitswap.go:181
#   0x1c39d30   github.com/ipfs/boxo/bitswap/network.(*impl).handleNewStream+0x530                          github.com/ipfs/boxo@v0.15.0/bitswap/network/ipfs_impl.go:431
#   0x1b0a0de   github.com/libp2p/go-libp2p/p2p/host/basic.(*BasicHost).SetStreamHandler.func1+0x3e                 github.com/libp2p/go-libp2p@v0.32.1/p2p/host/basic/basic_host.go:598
#   0x1b09107   github.com/libp2p/go-libp2p/p2p/host/basic.(*BasicHost).newStreamHandler+0x6a7                      github.com/libp2p/go-libp2p@v0.32.1/p2p/host/basic/basic_host.go:440
#   0x1b32686   github.com/libp2p/go-libp2p/p2p/net/swarm.(*Conn).start.func1.1+0xa6                            github.com/libp2p/go-libp2p@v0.32.1/p2p/net/swarm/swarm_conn.go:137

180: 483840 [1333: 3583104] @ 0xcd2266 0xcd3d10 0xcd7025 0x1c257ec 0x1c257f0 0x1c549fc 0x1c58ade 0x1c58ffa 0x1c6df45 0x1c39d31 0x1b0a0df 0x1b09108 0x1b32687 0xd32fe1
#   0x1c257eb   github.com/ipfs/boxo/bitswap/client/internal/blockpresencemanager.(*BlockPresenceManager).updateBlockPresence+0x22b github.com/ipfs/boxo@v0.15.0/bitswap/client/internal/blockpresencemanager/blockpresencemanager.go:48
#   0x1c257ef   github.com/ipfs/boxo/bitswap/client/internal/blockpresencemanager.(*BlockPresenceManager).ReceiveFrom+0x22f     github.com/ipfs/boxo@v0.15.0/bitswap/client/internal/blockpresencemanager/blockpresencemanager.go:30
#   0x1c549fb   github.com/ipfs/boxo/bitswap/client/internal/sessionmanager.(*SessionManager).ReceiveFrom+0x9b              github.com/ipfs/boxo@v0.15.0/bitswap/client/internal/sessionmanager/sessionmanager.go:158
#   0x1c58add   github.com/ipfs/boxo/bitswap/client.(*Client).receiveBlocksFrom+0x5fd                           github.com/ipfs/boxo@v0.15.0/bitswap/client/client.go:336
#   0x1c58ff9   github.com/ipfs/boxo/bitswap/client.(*Client).ReceiveMessage+0x279                          github.com/ipfs/boxo@v0.15.0/bitswap/client/client.go:380
#   0x1c6df44   github.com/ipfs/boxo/bitswap.(*Bitswap).ReceiveMessage+0x84                             github.com/ipfs/boxo@v0.15.0/bitswap/bitswap.go:181
#   0x1c39d30   github.com/ipfs/boxo/bitswap/network.(*impl).handleNewStream+0x530                          github.com/ipfs/boxo@v0.15.0/bitswap/network/ipfs_impl.go:431
#   0x1b0a0de   github.com/libp2p/go-libp2p/p2p/host/basic.(*BasicHost).SetStreamHandler.func1+0x3e                 github.com/libp2p/go-libp2p@v0.32.1/p2p/host/basic/basic_host.go:598
#   0x1b09107   github.com/libp2p/go-libp2p/p2p/host/basic.(*BasicHost).newStreamHandler+0x6a7                      github.com/libp2p/go-libp2p@v0.32.1/p2p/host/basic/basic_host.go:440
#   0x1b32686   github.com/libp2p/go-libp2p/p2p/net/swarm.(*Conn).start.func1.1+0xa6                            github.com/libp2p/go-libp2p@v0.32.1/p2p/net/swarm/swarm_conn.go:137
gammazero commented 1 month ago

@Dreamacro PR #636 is a simple code modification that should help with unbounded map growth observed in this issue. The change allows GC to free memory when the BlockPresenceManager map becomes empty, and removes peers from BlockPresenceManager that have sent repeated DONT_HAVEs. This is a smaller scoped change than using an LRU cache as you have done in the example fix you provided, and I would be interested in getting your feedback as to whether it is sufficient for your test case.

Dreamacro commented 1 month ago

@gammazero I can't quite confirm if this PR fixes the memory leak or not. The root cause of the problem is that after SessionManager.cancelWants is called, there may still be nodes to send HAVE or DONT_HAVE in the future, and it is handled by ReceiveFrom finally.

https://github.com/ipfs/boxo/blob/c94c23a30d459c704a6edd6a1f06c6a302b52e05/bitswap/client/internal/session/sessionwantsender.go#L458

I'm not sure if the changes here can fix the root cause of the memory leak, if HAVE/DONT_HAVE is no longer processed after the prune node, the problem should be fixed