tgingold-cern / cheby

GNU General Public License v3.0
7 stars 4 forks source link

Consecutive single cycle write access #48

Closed lorenzschmid closed 3 days ago

lorenzschmid commented 1 week ago

A register map generated by cheby translates each request from the bus interface (APB, Wishbone, ...) to a request on an internal bus (ibus). Such a request on the internal bus takes only one cycle ("pulse") independently of when it is executed. While the request on the bus interface might be still active (until it is confirmed/responded to), the internal request vanishes after that one cycle. A write request on the internal bus consists of wr_req, wr_adr, wr_dat, and wr_sel. Similarly, a read request consists of rd_req, rd_adr, and rd_dat. Following the request, the internal bus returns the response. Again, the response is only active during one cycle. It consists of wr_ack, and wr_err for a write request and of rd_ack, rd_dat, and rd_err for a read request.

Without pipelining, a request is executed immediately, i.e., the data will be written to the register upon the next rising clock edge, while the read data is returned immediately such that the response can be sampled upon the next rising clock edge as well. Using the pipeline attribute, the request, the response or both can be sampled before being acted upon/being returned.

Looking at the bus, one would expect the following when setting pipeline: none: ibus_theo

Unfortunately, this is not true and it seems as if the write acknowledge signal, wr_ack, is first sampled and only returned one cycle later (even with pipeline: none): ibus_now

Why is this a problem?

It seems more consistent to me that the write and read "response" behave the same, i.e. they are "instantaneous" while purely combinatorial paths can be avoided by using the appropriate pipeline settings. Right now, setting pipeline to wr-in or wr-out further increases the latency. But more importantly, the current setup makes it impossible to have a bus interface with consecutive single cycle write requests. Since the multiplexer returning the wr_ack signal is combinatorially based on wr_adr, wr_adr has to remain constant for at least two cycles: One cycle for the correct routing of the request (wr_req/wr_adr/wr_dat/wr_sel) and a second cycle for the always sampled wr_ack response. If not, the wr_ack coming in the second cycle is not forwarded (and hence, lost) as the address of the next request might have changed the multiplexer.

The following lines from proto/cheby/hdl/genreg.py are at the heart of the problem: Instead of creating a combinatorial path between the request, n.h_wreq, and the response, n.h_wack, the signal is sampled.

    def gen_processes(self, ibus):
        ...
        if n.access in ['rw', 'wo']:
            ...
            if n.h_has_regs:
                ...
                # In case of regs, the strobe is delayed so that it appears at the same time as
                # the value.
                ffproc.sync_stmts.append(HDLAssign(n.h_wack, n.h_wreq))
                ffproc.rst_stmts.append(HDLAssign(n.h_wack, self.strobe_init()))

In contrast, the read acknowledge generation inside the same file is combinatorial:

    def gen_read(self, s, off, ibus, rdproc):
        ...
        # Acknowledge
        if n.h_rack_port is not None:
            rack = n.h_rack_port
            rack = self.strobe_index(off, rack)
        elif ibus.rd_req_del:
            rack = ibus.rd_req_del
        else:
            rack = ibus.rd_req
        s.append(HDLAssign(ibus.rd_ack, rack))

I see two possible solutions:

Changing the behavior of wr_ack might lead to unwanted side effects. Hence this issue, to further discuss possible solutions on who to realize consecutive single cycle write requests.

Overall, I am wondering if I understood the current implementation correctly and if the delay of the wr_ack signal is intended and if yes, for what reasons (the comment mentions something about the strobe)?

lorenzschmid commented 1 week ago

@tgingold-cern would be great to have your insights on this, thank you!

tgingold-cern commented 1 week ago

I think you can adapt wr_ack. I don't see why the behaviour is not the one you'd like to have.

The wr_ack delay is needed when there is a register because the strobe is the same as the ack. And the user of the strobe wants to get the new value (which is the output of the dff). So have one signal for ack and strobe (if this is the case) is a mistake.

lorenzschmid commented 1 week ago

Thanks for the quick reply. In that case I will prepare a merge request to adapt the wr_ack behavior inside genreg.py.

lorenzschmid commented 1 week ago

In the meantime, I implemented the changes in #49. Please let me know what you think about it.