emil14 commented 1 month ago

Current runtime implementation would spawn goroutine for each pipe/fan-in connection. Even though (direct) fan-in connections are race-free, there's still race-condition in parallel connections.

Simplest picture of the problem. s1, s2, r1 and r2 are all ports:

s1 -> r1
s2 -> r2

Because s1 -> r1 and s2 -> r2 are (concurrent) goroutines there's a race-condition. What it means is that this situation is not impossible:

2 goroutines were spawned to serve these two pipelines
s1 sends to m1 message to r1 and blocks
s1 sends to m2 message to r2 and also blocks
go scheduler activates second s2 -> r2 goroutine and receives message from s2 to send it to r2 (and blocks)
s2 unblocks
go scheduler activates first s1 -> r1 goroutine, and receives message from s1 to send it to r1 (and blocks)
s1 unblocks

At the step 3 we can see race condition. There's some order set by senders: 1) s1-> 2) s2-> and that order needs to be preserved in 1) ->r1 2) ->r2.

It's very important to point that even if this is preserved, there's no guarantee that r1 will actually receive before r2. There are multiple reasons for that starting from r1 could simply be slower than r2, to go scheduler activating r1 and r2 in different order. That's ok, we don't have control over these things.

What we want guarantee is, again, that we send to receivers in the same order as we received from senders.

Now it's a lot of questions like:

Is this always necessary? Could it be limited to only ports of the same node? E.g. what is the case for a -> n1:x; b -> n2:x; where it's critical that n1 receives before n2.
If it's necessary (there are cases where >=2 nodes needs order), could it be handled thought locks, HOCs?
For what's necessary - which options do we have that meet these requirements? What are their upsides and downsides?

Very close to #689

emil14 commented 1 month ago

Channel is Inport, not In/OutPort (+ Back to Graph Reduction)

We don't have fan-out at runtime anymore which means it's possible to move from old design (channel per port) to new - channel per inport. For pipeline connections that would mean that channel is connection itself, for fan-in connections that would mean that several runtime functions write to the same inport-channel.

This would solve the issue we are talking about at all levels (array-inport, several inports, even several nodes). If we don't have goroutines that do delivery (receive from sender, send to receiver) we simply can't have race condition. There's no "delivery", runtime functions (senders) just writes directly to their receivers (other runtime functions).

Graph Reduction

See #212 #403 for details

If there's no delivery goroutines - all message passing is done by runtime functions themselves. We already did almost this with single-queue for all outports. This time we are going even further - outports don't represented by channels at all. Only inports have channels. Outport directly writes to inport. That's pipeline. For fan-in - it's already naturally works - several runtime functions write to the same channel. It's free serialization. We don't even need atomic counters for that. (We still might need counters for array-inports serialized receiving though - that's the situation that is exactly the same as we have with existing fan-in (several senders, one receiver).

However, that means we need graph reduction. It's not (again) optimization pass, but rather requirement for program to work.

Deadlock

We can get the same situation as we had with single queue solution. Component can write to its outport by actually writing to its own inport if we have cycle. This however could be in theory solved by having buffer with at least 1 message, but we need to think about that more

emil14 commented 1 month ago

All Node's Inports Use Single Queue

...

nevalang / neva

Race-free Parallel Connections #695

Channel is Inport, not In/OutPort (+ Back to Graph Reduction)

Graph Reduction

Deadlock

All Node's Inports Use Single Queue