Open rahulghangas opened 3 years ago
@rahulghangas Not sure I fully understand the issue/implication of the dual timeouts. Do you mind listing out a concrete example?
The call to gossip has the following signature
func (g *Gossiper) Gossip(ctx context.Context, contentID []byte, subnet *id.Hash)
where the context provided usually has a timeout of, say 5-10 seconds. On the other hand, if you look at didReceiveSync
func (g *Gossiper) didReceiveSync(from id.Signatory, msg wire.Msg) {
...
...
...
ctx, cancel := context.WithTimeout(context.Background(), g.opts.Timeout)
defer cancel()
g.Gossip(ctx, msg.Data, &subnet)
after receiving the sync message, we regossip the message, but using the internal timeout. The internal timeout is supposed to be used to define the time limit of sending a single message, not a whole round of gossipping.
@rahulghangas Got it. Could we send to the recipients in parallel inside the Gossip
function to resolve this? I will check with @loongy to gauge the intended usage of the timeout.
That could potentially solve the issue, and the call to gossip wouldn't require a context as input since we'll use the internal timeout to define a context and use it on all messages (in parallel)
While gosspping, two different timeouts are used
GossipOptions
structThe first one defies the total time for a round of gossipping, while the second one defines the timeout for sending each message. When a sync message is received, the gossipper tries to propagate the new message by re-gossipping it. However, the gossipper now has no notion of the timeout supplied by the user (and that particular context has probably gone out scope!). Currently we use the second timeout to a call to
Gossip
, but the gossipping might fail if the first attempt to message a peer fails (since that uses the same timeout)