Closed iand closed 1 year ago
Some things I noticed:
inboundEvents
/outboundEvents
channels should be unbuffered.outboundEvents
channel but return channels that the caller then subscribes to on a per-request basis. E.g., FindNode
would return a channel. Right now, there's this indirection through the event loop of the IpfsDht. The coordinator would handle the I/O to the caller (the design also states the coordinator does the I/O).
NodeIter.Each
pattern breaks the state machine pattern. Alternatively, an iter could implement a Next()
and Reset()
methods or just a method that returns the list of nodes directly. That way, the query could just normally iterate over them, and you wouldn't need to capture local variables in a closure.queries
and queriesIndex
on QueryPool?the
inboundEvents
/outboundEvents
channels should be unbuffered.
A small outbound buffer can smooth over temporary differences in read/send speed and it still blocks if the reader falls behind. See below as well.
Inbound could be unbuffered.
I think it would be nicer if the coordinator had not a single
outboundEvents
channel but return channels that the caller then subscribes to on a per-request basis. E.g.,FindNode
would return a channel. Right now, there's this indirection through the event loop of the IpfsDht. The coordinator would handle the I/O to the caller (the design also states the coordinator does the I/O). The IpfsDht wouldn't need to keep track of ongoing queries and their "waiters."
This gives the coordinator a lot more state to manage. It would still need a general purpose outbound event channel to notify events that are not for specific requests (e.g. routing table updates. rust libp2p also emits events to notify of unroutable peers).
I can see how it would be easier to work with but what should the coordinator do if it cannot send to one of the channels? Should it block? It can't skip sending the message and waiting in a new goroutine isn't viable if we want to keep resource utilization deterministic.
Of course it faces the same problem with a single outbound queue, however the usage pattern there makes it less likely that the caller will stop reading from the channel. If they do then nothing makes progress in their application.
We discussed this, and you had good reasons to keep it as it is, but I felt again that it would be great if the caller wouldn't need to generate a query ID. This can be handled by either the coordinator/pool?
Is there a good case for random identifiers over meaningful ones?
Stopping queries doesn't cancel the requests. It just marks them as finished. The resources should be cleaned up.
The requests will run to completion and no more will be issued. What cancelation mechanism would be used?
the
NodeIter.Each
pattern breaks the state machine pattern. Alternatively, an iter could implement aNext()
andReset()
methods or just a method that returns the list of nodes directly. That way, the query could just normally iterate over them, and you wouldn't need to capture local variables in a closure.
An iter is just a list of nodes that can be enumerated in a specific order. Its not a state machine. Originally it was but it added much more complexity with almost no new value. Each state was duplicated almost one to one in the query. Adding Next would require it to hold iteration state. Each is a straightforward way to enumerate the list.
what's the heartbeat for?
To allow the state machine to make progress for time-sensitive states such as timeouts. If all queries had stalled requests then zero progress could be made since there would be no incoming event to trigger a state change.
why
queries
andqueriesIndex
on QueryPool?
Queries must be processed in order so this maintains an ordering while allowing random lookup for responses.
@dennis-tra and I discussed this PR on a call.
We're going to open an issue for cancelling outbound requests cleanly.
Also discussed tighter integration with the action/scheduler/queue system already in place. We should be able to unify the two models but I tried and failed a couple of times during development of the original example. We have approach that should work and Dennis is going to take a look.
Created issue for canceling outbound requests. Gave it a try to unify both, but also ran into issues.
I had another thought about the IpfsDht consuming events in a single loop and I'm still not convinced this is the right way to do it. Don't we effectively move from a single event loop to two event loops?
My alternative proposal was to let StartQuery
return a channel scoped to the query instead (I'm also not super convinced that this is great).
You had several arguments in favor of a single loop reading the outboundEvents channel. The strongest one I think is that it's less likely for implementers to not drain the channel. In an ephemeral query-scoped events channel implementers could forget to drain it.
Some things that are floating around in my mind in favour of query-scoped channel:
This change promotes the state machine logic from the
statemachine
example to the main codeline.The main additions are:
The intention is to add further state machines to the coordinator to implement other features such as routing table refresh.
See the state machine rationale and design document.