ipfs / go-bitswap

The golang implementation of the bitswap protocol
MIT License
216 stars 112 forks source link

[WIP] `newStreamToPeer` hangs #476

Closed schomatis closed 3 years ago

schomatis commented 3 years ago

As explained in https://github.com/ipfs/go-ipfs/issues/7972#issuecomment-824327775 newStreamToPeer doesn't have a timeout in its context and stalls forever.

The most direct fix (https://github.com/ipfs/go-bitswap/pull/477) is to add a timeout for a hardcoded (or make it an option somewhere) amount of time in

https://github.com/ipfs/go-bitswap/blob/963dc8fdd2260b3b330bd461a310f5db87d640cf/network/ipfs_impl.go#L310-L315

I'm opening this issue to discuss potentially better fixes (which can happen either as a complement or instead of the above one).

Without a deeper BS knowledge I can find another newStreamToPeer call which actually does have a send timeout along with other sane options that I would expect to have when connecting/sending.

https://github.com/ipfs/go-bitswap/blob/963dc8fdd2260b3b330bd461a310f5db87d640cf/network/ipfs_impl.go#L100-L112

They happen in the context of the streamMessageSender and the interface MessageSender (which seems close to the more broadly used BitSwapNetwork). This logic seems to be used only for the want list managment (MessageQueue), which is handled by a different part of the code than the WANT/HAVE message processing. I don't know the exact rationale for this distinction since both deal with a lot of overlap.

Looking into this some more...

welcome[bot] commented 3 years ago

Thank you for submitting your first issue to this repository! A maintainer will be here shortly to triage and review. In the meantime, please double-check that you have provided all the necessary information to make this process easy! Any information that can help save additional round trips is useful! We currently aim to give initial feedback within two business days. If this does not happen, feel free to leave a comment. Please keep an eye on how this issue will be labeled, as labels give an overview of priorities, assignments and additional actions requested by the maintainers:

Finally, remember to use https://discuss.ipfs.io if you just need general support.

Stebalien commented 3 years ago

I don't know the exact rationale for this distinction since both deal with a lot of overlap.

Yeah, I'm a bit confused as well. I think the main difference is: SendMessage is used for sending blocks, one per stream. SendMsg is used for sending wantlist messages, where we want to use a single stream. But I'm not entirely sure why...

A hard-coded large enough timeout should be fine (for now) so I merged the patch. My main concern is that 5s may be too small due to backpressure in some cases... but it should be fine.

Stebalien commented 3 years ago

This also brings up the more general question: we should likely be setting more timeouts in bitswap when sending data to deal with slow peers.

schomatis commented 3 years ago

we should likely be setting more timeouts in bitswap when sending data to deal with slow peers.

Yes. At the very least I would expect (as a consumer of this library) to have some catch-all maximum timeout (even if in the order of minutes) to never reach a deadlock situation.

schomatis commented 3 years ago

SendMsg is used for sending wantlist messages, where we want to use a single stream. But I'm not entirely sure why...

I'm assuming because the wanlist broadcasting has a more regular flow to each peer than on-deman blocks, but it's just a guess.

Stebalien commented 3 years ago

Yes. At the very least I would expect (as a consumer of this library) to have some catch-all maximum timeout (even if in the order of minutes) to never reach a deadlock situation.

Yeah, we probably should have one.

I'm assuming because the wanlist broadcasting has a more regular flow to each peer than on-deman blocks, but it's just a guess.

That and the fact that wantlist messages tend to be small.

schomatis commented 3 years ago

I don't see a way forward that doesn't imply a big refactor, and with https://github.com/ipfs/go-bitswap/pull/477 in place I'm going to consider this fixed (at least for the short term).