ipfs / go-graphsync

Initial Implementation Of GraphSync Wire Protocol
Other
100 stars 38 forks source link

fix: shutdown queue on publish error if not done #412

Closed jacobheun closed 1 year ago

jacobheun commented 1 year ago

Summary

This is a patch on top of 0.13.2. While debugging network disconnect issues in Boost during retrieval we discovered a leak in go-routines for graphsync. The issue is that a hard network disconnect may still result in the MessageQueue being restarted if there are messages the server is still attempting to send.

This change does not fully fix the issue but after multiple runs of force disconnecting 500 retrievals, we saw a 10x reduction in open goroutines. More robust handling for hard network disconnects may be warranted in the 0.14.x line, but that is also likely a non trivial effort. This gets us most of the way.

Goroutine dumps from boost

The below were goroutine diffs between startup and after executing 500 forceful disconnects.

Before image

After image

welcome[bot] commented 1 year ago

Thank you for submitting this PR! A maintainer will be here shortly to review it. We are super grateful, but we are also overloaded! Help us by making sure that:

Getting other community members to do a review would be great help too on complex PRs (you can ask in the chats/forums). If you are unsure about something, just leave us a comment. Next steps:

We currently aim to provide initial feedback/triaging within two business days. Please keep an eye on any labelling actions, as these will indicate priorities and status of your contribution. We are very grateful for your contribution!