renproject / aw

A flexible P2P networking library for upgradable distributed systems.
MIT License
38 stars 18 forks source link

Sync context always timing out #61

Closed jazg closed 3 years ago

jazg commented 3 years ago

There seem to be at least two issues in the syncer:

  1. We're overriding the global context on the following lines, when instead we should probably be defining an inner context:

https://github.com/renproject/aw/blob/ac3835b6789ad7260e74dce6be88cf9bdedbb2db/peer/sync.go#L110-L117

  1. The context always times out, even when the DidReceiveMessage function is called and the pending content flag is signalled. Is there a reason we can't have something as simple as the following?
type pendingContent struct {
    // content is nil while synchronisation is happening. After synchronisation
    // has completed, content will be set.
    content chan []byte
}

func (pending *pendingContent) wait() <-chan []byte {
    return pending.content
}

func (pending *pendingContent) signal(content []byte) {
    pending.content <- content
}
rahulghangas commented 3 years ago

@jazg do you have an example demonstrating the problem? I understand this method needs some changes, but it would be great if I can see the exact problem first

rahulghangas commented 3 years ago

Nvm, reproduced the exact error, will push fix

rahulghangas commented 3 years ago

Ok, this was a tricky one. Two issues, a) We never signaled the cond b) The map we were using was keeping copies of the pending struct rather than a pointer

-- Edit To clarify, even after adding pending.cond.Broadcast() to the signal method, I noticed that even though pending.content was being set, it was still nil in the wait method. Which makes sense, since pending is a struct and pending.content (byte slice) is a struct too, which means we were only assigning to a field of a copy and the goroutines were waiting on another copy where they never got to access the information. However, all goroutines waiting on cond would still wake up on calling pending.cond.Broadcast() because it is a pointer, and both structs point to the same sync.Cond variable

jazg commented 3 years ago

@rahulghangas Fix looks pretty good to me (although I still need to test it out)! I think the issue with the context still exists – I can push a quick fix for this though (edit: pushed https://github.com/renproject/aw/commit/856a76f5b44b83ec08516227b2e0af9477f56b6f).