ucb-bar / chisel2-deprecated

chisel.eecs.berkeley.edu
388 stars 90 forks source link

Added blocks for input, output, and input-output timing closure #691

Open hutch31 opened 8 years ago

hutch31 commented 8 years ago

Created two Chisel blocks for timing closure. These blocks are designed for use in decoupled IO designs, to make closing timing on module boundaries easier.

ghost commented 8 years ago

Can one of the admins verify this patch?

ucbjrl commented 8 years ago

ok to test

sdtwigg commented 8 years ago

These seem functionally identical to a single entry queue where: For DCInput, flow is true, pipe is false For DCOutput, flow is false, pipe is true

hutch31 commented 8 years ago

They are functionally identical to a single-entry queue, however their timing characteristics are very different. The Queue module does not close timing on any of its outputs -- enq_ready and deq_valid are both combinatorial. DC input provides a registered output for its consumer ready, and DCOutput provides registered outputs for its producer valid and producer bits.

If you wanted to close timing on a Queue at the top level, you would use DCInput and DCOutput to "wrap" the queue:

class WrapQueue { val io = { ... } val dci = Module(DCInput(type)) val queue = Module(Queue(type, entries, ...)) val dco = Module(DCOutput(type))

io.enq <> dci.io.c dci.io.p <> queue.io.enq queue.io.deq <> dco.io.c dco.io.p <> io.deq }

On Wed, Apr 13, 2016 at 11:09 AM, Stephen Twigg notifications@github.com wrote:

These seem functionally identical to a single entry queue where: For DCInput, flow is true, pipe is false For DCOutput, flow is false, pipe is true

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/ucb-bar/chisel/pull/691#issuecomment-209572919

sdtwigg commented 8 years ago

I see the issue: the underlying desire is that the ready or valid ports be driven directly by a register.

Although, this is still true in the single-entry queue case (since all the combinatorial logic collapses into constants and a reference to a single register). In a single-entry queue case, ptr_match is always true. Therefore, when pipe is false, io.enq.ready := maybe_full (the Q port of a register) Or, when flow is false, io.deq.valid := !maybe_full (the ~Q port of a register).

Perhaps Queue could be modified to better exhibit this in the single-entry case and not rely on constant propagation. DCInput and DCOutput could then be defined as specific instantiations of Queue. (Although it would be nice to enforce some sort of check that the respective ready and valid paths do not involve combinational logic.) This avoids having to independently verify DCInput and DCOutput.

A concern I have with the current DCInput and DCOutput modules is that they introduce a new convention for interfaces with p and q into the ChiselUtils. Also, the Bits constraint on T and creation of the hold register with UInt in DCInput is concerning.

aswaterman commented 8 years ago

Keeping in mind what @sdtwigg said, it's hard to see the opportunity for QoR improvement over 1-entry pipe-/flow queues... the inverter on maybe_full doesn't count.

hutch31 commented 8 years ago

I'm still coming up to speed on Chisel, I can easily change the naming for the input and outputs. The producer/consumer naming convention comes from the original verilog designs (https://github.com/hutch31/sdlib). Other names we came up (such as enqueue and dequeue) were specific to what they were being attached to -- "enqueue" doesn't make sense as the input to an arbiter.

As far as I know there is no way to have a registered output on c.ready and not have a holding register, as the case where c.ready == 1 and p.ready == 0 needs to be handled. As the output is flopped, the block cannot take away c.ready when the producer side is unable to accept data, and therefore needs a holding register for the 1 cycle it takes to react.

hutch31 commented 8 years ago

I generated a single-entry queue and looked at the implementation, with pipe=true it is equivalent in timing performance, so I replaced DCOutput by extending the Queue class.

I also looked at Queue(pipe=false, depth=2) implementation, but it goes through at least 3 levels of logic to generate enq.ready and uses 2X the flops of SDInput, so I think this is still an improvement for module input interfaces.