nickg / nvc

VHDL compiler and simulator
https://www.nickg.me.uk/nvc/
GNU General Public License v3.0
589 stars 75 forks source link

View array ports are not driven and are not dumped to the .fst file #856

Closed m-kru closed 3 months ago

m-kru commented 3 months ago

It looks like ports being part of a view array are not driven and have constant U value. Reproducer: https://github.com/m-kru/nvc-view-issue.

The source of the problem might be the use of all in the process:

  addr_matrix_driver : process (all) is
  begin
    for c in completer_range loop
      for r in requester_range loop
        addr_matrix(c)(r) <= '0';
        report to_debug(requesters(r));
        if (requesters(r).addr and unsigned(to_std_logic_vector(MASKS(c)))) = ADDRS(c) then
          addr_matrix(c)(r) <= '1';
        end if;
      end loop;
    end loop;
  end process;

In the report message, I can see that this process is evaluated only at time 0ms+0. However, the signals connected to the requesters port change at 2ns+1.

Ports being part of a view array are also not dumped to the .fst file.

nickg commented 3 months ago

It looks like ports being part of a view array are not driven and have constant U value.

I fixed a couple of issues I found while looking at this but I believe it's correct that these signals all have value 'U'.

The problem is the assignments to fields in requesters in the router process:

            requesters(r).ready  <= completers(c).ready;
            requesters(r).rdata  <= completers(c).rdata;
            requesters(r).slverr <= completers(c).slverr;
            requesters(r).ruser  <= completers(c).ruser;
            requesters(r).buser  <= completers(c).buser;

Although you are only assigning to the output elements here, the LRM specifies that a signal assignment statement in a process creates drivers for every sub-element of the longest static prefix of the target. The longest static prefix of requesters(r).ready is requesters (because r is non-static) which also, perhaps unexpectedly, includes all the input elements. These then have multiple sources, one of which contributes 'U' giving a resolved value of 'U'.

Here's a smaller cases that demonstrates the problem:

library ieee;
use ieee.std_logic_1164.all;

entity test is
end entity;

architecture beh of test is
    type t_rec is record
        f, g : std_logic;
    end record;

    type t_rec_array is array (natural range <>) of t_rec;

    view t_rec_in of t_rec is
        f : in;
        g : out;
    end view;

    signal s : t_rec;
begin
    s.f <= '0';

    b: block is
        port ( p : view (t_rec_in) of t_rec_array );
        port map ( p(0) => s );
    begin
        process (all) is
            variable sel : integer := 0;
        begin
            report to_string(p(sel).f);
            p(sel).g <= not p(sel).f;
        end process;
    end block;
end architecture;

I tested this on Riviera-PRO and it also reports U here so I'm pretty sure my interpretation is correct. I suspect this case wasn't considered when views were added to the 2019 LRM.

You can probably work around this by moving all that logic into a procedure which takes a signal array view as an argument since the procedure call will only create drivers for those sub-elements passed as mode out or inout (although I don't think 4.2.2.3 was updated for views either, it just works out that way).

Maybe @JimLewis or @Paebbels have some insight.

Ports being part of a view array are also not dumped to the .fst file.

I don't think there's anything special about it having an array mode view, dumping arrays-of-records has never been supported.

Paebbels commented 3 months ago

The view is like an overlay, just defining directions of individual signals in a group (a.k.a view). A view shouldn't have any influence on how signals are dumped. It just depends on the record. A view has also no influence on the initial values of signals or ports - it purely depends on the used datatypes (elementary or composite types).

I remember a special rule for records, but I would need to read the LRM for a more specific answer. I remember that all elements for a record are assigned entirely by a driver in a process. Here s.f is a concurrent statement, so a single-line process. Because s.g is not specified, it's assigned with U. So like s <= (f => '0', g => 'U');.

Now it gets tricky in the LRM (IIRC), but you could verify with other tools and bit_vector ... the signal outside is initialized to U and the output from the port is driven by 0. Actually you should get 2 drivers if I'm not mistaken (not, s was assigned - not initialized). And then resolution function kicks in and resolves U and 0.

Does this reflect your observations?

nickg commented 3 months ago

I remember a special rule for records, but I would need to read the LRM for a more specific answer. I remember that all elements for a record are assigned entirely by a driver in a process. Here s.f is a concurrent statement, so a single-line process. Because s.g is not specified, it's assigned with U. So like s <= (f => '0', g => 'U');.

I don't think that's correct: a process should only define drivers for the longest static prefix of the target of its signal assignment statements. I'm not aware of any special handling for records here.

For s.f <= '0', s.f is a static selected name by the rules in 8.1 and so that outer process only defines a single driver for element f of s. However the process inside the block has the assignment p(sel).g <= not p(sel).f; which although it only assigns to element g of one array element, actually creates drivers for every sub-element of p because p(sel).g is not a static name and the longest static prefix is p.

BTW there was a mistake in the original code example above. I edited it.

nickg commented 3 months ago

The user's original code is here: https://github.com/m-kru/nvc-view-issue/blob/a10f220655281b178748b983eb9ff3756ba0603d/crossbar.vhd#L162

I don't think it can work correctly with the way that views are currently specified.

m-kru commented 3 months ago

I am not an author of LRM. However, if you can't assign values to the out elements of a view port, then what is the point of the view. It creates more problems than it solves. With any view having at least one in and one out you end up with 2 record drivers.

nickg commented 3 months ago

With any view having at least one in and one out you end up with 2 record drivers.

Only when the longest static prefix of the target of the signal assignment contains both in and out elements. E.g. requesters(0).ready <= ... only creates a driver for ready.

Have you tried moving lines 125-181 to a separate procedure?

m-kru commented 3 months ago

If I use wrapper procedure it works. However, I have to add --relaxed as I get following error:

** Error: actual associated with signal parameter REQ must be denoted by a static signal name
     > /home/mkru/workspace/vhdl/vhdl-amba5/apb/crossbar.vhd:183
     |
 183 |             assign(requesters(r), completers(c));
     |                    ^^^^^^^^^^^^^ not a static signal name
     |
     = Note: the --relaxed option downgrades this to a warning
     = Help: IEEE Std 1076-2008 section 4.2.2.3 "Signal parameters"
     = Help: IEEE Std 1076-2019 section 8.1 "Names"

I can't operate directly on port with following procedure signature: procedure assign (signal req : view completer_view; signal com : view requester_view);. Relatively lot of obstacles for such a conceptually simple thing.

JimLewis commented 3 months ago

@m-kru Can you reformulate the for loops as for generate? For generate loop indicies are static enough to address this issue. Not sure that will address all of your issues, but it will fix some of them.

nickg commented 3 months ago

I can't operate directly on port with following procedure signature: procedure assign (signal req : view completer_view; signal com : view requester_view);.

You need to pass the array and the index separately to the procedure and then do the indexing inside the procedure. So the signature should be procedure assign (signal req : view (completer_view) of interface_array_t; r : natural; signal com : view requester_view);.

There's a ticket about this on the IEEE issue tracker which contains the rationale for why this rule exists: https://gitlab.com/IEEE-P1076/VHDL-Issues/-/issues/275

m-kru commented 3 months ago

@JimLewis for-generate does not help here. As this is a crossbar, the connection changes dynamically. Only one of the loops can be described using for-generate. However, reformulating one loop to be for-generate does not solve any issue here.

m-kru commented 3 months ago

@nickg let's suppose I have following description:

  completer_logic : for c in completer_range generate
    router : process (clk_i) is
      function hot_bit_count (slv : std_logic_vector) return natural is
        variable cnt : natural := 0;
      begin
        for i in slv'range loop
          if slv(i) = '1' then cnt := cnt + 1; end if;
        end loop;
        return cnt;
      end function;

      function hot_bit_idx (slv : std_logic_vector) return natural is
      begin
        for i in slv'range loop
          if slv(i) = '1' then return i; end if;
        end loop;
        report PREFIX & "hot bit not found in vector """ & to_string(slv) & """" severity failure;
      end function;

      -- Requester index
      variable r : requester_range;

      procedure assign (signal req : view (completer_view) of interface_array_t; r : natural; signal com : view requester_view) is
      begin
        com.addr   <= req(r).addr;
        com.prot   <= req(r).prot;
        com.nse    <= req(r).nse;
        com.selx   <= req(r).selx;
        com.enable <= req(r).enable;
        com.write  <= req(r).write;
        com.wdata  <= req(r).wdata;
        com.strb   <= req(r).strb;
        com.auser  <= req(r).auser;
        com.wuser  <= req(r).wuser;

        req(r).ready  <= com.ready;
        req(r).rdata  <= com.rdata;
        req(r).slverr <= com.slverr;
        req(r).ruser  <= com.ruser;
        req(r).buser  <= com.buser;
      end procedure;
    begin

How do you handle the fact that there are subprogram specifications in the process declarative part, and the process is in the for-generate loop? Does each process get its own instance of function and procedure? Does this decrease the performance in any way?

nickg commented 3 months ago

How do you handle the fact that there are subprogram specifications in the process declarative part, and the process is in the for-generate loop?

It's implemented in a similar way to closures in other languages: each subprogram is passed a pointer to its enclosing region. It's quite a common pattern and there shouldn't be any performance impact.

m-kru commented 3 months ago

@nickg I think we can close this issue as it looks like now it is resolved what is going on.

I just have one more question, you have written:

I don't think there's anything special about it having an array mode view, dumping arrays-of-records has never been supported.

Is it because it is not easy to implement, or maybe there is some other reason?

nickg commented 3 months ago

Arrays-of-records should now be dumped if you pass --dump-arrays. And as discussed the U values are expected behaviour.