google / xls

XLS: Accelerated HW Synthesis
http://google.github.io/xls/
Apache License 2.0
1.15k stars 166 forks source link

[enhancement] Support peek() op in XLS #1383

Open hongted opened 1 month ago

hongted commented 1 month ago

What's hard to do? (limit 100 words)

Often requested is a peek() op that given a streaming channel, will return the data at the head of the channel (if any) and a valid bit (true if there were data).

While both peek() and recv_non_blocking() bring us out of KPN land, peek() poses a challenge to simulation vs hardware as in simulation we do not backtrack executed instruction when blocking.

For example:

(token, data, valid) = peek(token, ch0);
(token, data0) = recv(token, ch0);

In hardware, the behavior depends on how peek and recv are scheduled. Let's say that they are scheduled in the same cycle. In this case, once the recv occurs, valid will be true and data will be equal to data0.

In an interpreter, one possible sequence of events is that the channel is empty, and peek executes --> data = 0 and valid = 0. The interpreter then blocks on recv. Once there is data on the channel, recv executes and returns data0.

In order to continue with supporting peek() operation, we should clarify and agree on the behavior of the interpreters, and how the operator is impacted by scheduling.

Current best alternative workaround (limit 100 words)

This is as opposed to the recv_non_blocking() op which will pop the data at the head of the channel if it exists.

Your view of the "best case XLS enhancement" (limit 100 words)

One idea that came out of the discussion was to implement peek and receive as a single op/expression -- so we don't need any roll-back functionality in case a receive blocks.

Proposal was something like like the following

  1. New expression: peek_and_recv_conditional(\, \, \<list of (bool, value) for peeked values>) : \ { body-expression }
  2. body-expression will have available the peek'ed values of each channel and if the channel has data
  3. return of the expression will be a tuple: (\, \).
  4. All of this expression should be scheduled together.
  5. No side-effecting ops in the body.
  6. This expression will peek the value of the channels, and determine which receive should happen. If the simulator blocks on a receive, then the simulator will roll-back to the beginning of the expression.
hongted commented 1 month ago

Note that peek() is generally requested to implement arbiters without the need for additional state within the block (or when the outside world uses a pop() as a signal that the arbiter chose a particular candidate).

ericastor commented 1 week ago

For example:

(token, data, valid) = peek(token, ch0);
(token, data0) = recv(token, ch0);

In hardware, the behavior depends on how peek and recv are scheduled. Let's say that they are scheduled in the same cycle. In this case, once the recv occurs, valid will be true and data will be equal to data0.

In an interpreter, one possible sequence of events is that the channel is empty, and peek executes --> data = 0 and valid = 0. The interpreter then blocks on recv. Once there is data on the channel, recv executes and returns data0.

I'm not quite sure this is fully accurate as written, and I don't believe it's necessarily a problem.

One way to think about this is that in both cases, it depends on scheduling. If the peek is before or at the same time as the recv, then we get one result (valid && (data == data0)), but if it's after, we get (!valid && data = 0).

Currently, the DSLX interpreter, proc IR interpreter, and proc IR JIT don't ever simulate operations as happening simultaneously, so given the token edge, we can only get the ordering peek before recv, whereas in HW we could get either peek before recv or peek & recv in the same stage... so the HW realization allows strictly more execution orders than the interpreter.

This is to be expected, since the interpreter serializes things, and is why logic that involves non-KPN operations can't be fully covered by simulations that don't take scheduling into account. FWIW, we also don't stress-test the results of changing execution order, so the interpreter doesn't even maximize the possible coverage for this scenario within these bounds.

So - I'm not sure we shouldn't just go ahead and implement peek()? It won't add any new inaccuracies to the interpreter, as it would only continue the pattern we already have... and I'm pretty sure it's impossible to avoid at least some of those inaccuracies before scheduling.

On the other hand, the proposed peek_and_recv_conditional block expression does have the advantage of being easier to use correctly, at the cost of being more complex to use at all (think poll(2) vs. fgetc + fungetc).

meheff commented 1 week ago

I agree with @ericastor that matching interpreter and hardware behavior generally for latency sensitive operations is really hard to do if not impossible and shouldn't stop us from implementing a generally useful operation like peek. I can see more complex operations being useful as well like the proposed peek_and_recv_condional and maybe something like channel select (see https://gobyexample.com/select). However we should probably still support the basic underlying peek primitive.