pulp-platform / ara

The PULP Ara is a 64-bit Vector Unit, compatible with the RISC-V Vector Extension Version 1.0, working as a coprocessor to CORE-V's CVA6 core
Other
379 stars 133 forks source link

Vector mask instructions VMSBF, VMSIF and VMSOF encountered errors during simulation #370

Closed 0rd1narY1 closed 4 days ago

0rd1narY1 commented 4 weeks ago

Hi! I tried simulating VMSBF, VMSIF and VMSOF instructions with self-added testcases consistent with the examples in the RVV1.0 spec. Then I met errors in the RTL simulation. However, I passed the simulations using Spike.

The example in the RVV1.0 spec

Screenshot from 2024-10-31 22-37-02 where v3 contents are masked by v0, and then the result is written in v2.

Testcase

void TEST_CASE3() {
  VSET(8, e8, m1);
  VLOAD_8(v3, 0x94, 0, 0, 0, 0, 0, 0, 0);
  VLOAD_8(v0, 0xC3, 0, 0, 0, 0, 0, 0, 0);
  VCLEAR(v2);
  __asm__ volatile("vmsbf.m v2, v3, v0.t");
  VCMP_U8(3, v2, 0x43, 0, 0, 0, 0, 0, 0, 0);
}

The testcase is the same as what is in the manual. The expected result is 01000011.

Simulations

Spike simulation passed successfully.

spike --isa=rv64gcv_zfh --varch="vlen:4096,elen:64" rv64uv-p-vmsbf 2> rv64uv-p-vmsbf.cout
Checking the results of the test case 1:
PASSED.
Checking the results of the test case 2:
PASSED.
Checking the results of the test case 3:
PASSED.
PASSED: rv64uv/vmsbf.c!

But RTL simulation got wrong result.

Loading ELF file /media/ord1nary/Yeah/Project/ara/apps/bin/rv64uv-ara-vmsbf
# Loading section 0000000080000000 of length 000000000000146a
# Loading section 0000000080001470 of length 00000000000004b8
# Loading section 0000000080001930 of length 0000000000000068
# [TRACER] Output filename is: trace_hart_0.log
# Checking the results of the test case 1:
# PASSED.
# Checking the results of the test case 2:
# PASSED.
# Checking the results of the test case 3:
# Index 0 FAILED. Got 3, expected 43.
# ERROR: /media/ord1nary/Yeah/Project/ara/apps/riscv-tests/isa/rv64uv/vmsbf.c failed 1 tests!
# ** Warning: Core Test *** FAILED *** (tohost = 1)

It shows that RTL simulation got a result of 0x3 instead of 0x43.

Possible RTL Design Bug

I think there may be a bug in MaskUnit Design. Here is the code in masku.sv, line 501-515:

[VMSBF:VMSIF] : begin
            if (&masku_operand_a_valid_i) begin
                for (int i = 0; i < NrLanes * DataWidth; i++) begin
                    if (alu_operand_b_seq[i] == 1'b0) begin
                        alu_result_vm[i] = (vinsn_issue.op == VMSOF) ? 1'b0 : not_found_one_d;
                    end else begin
                        not_found_one_d = 1'b0;
                        alu_result_vm[i] = (vinsn_issue.op == VMSBF) ? not_found_one_d : 1'b1;
                        break;
                    end
                end
                alu_result_vm_m = (!vinsn_issue.vm) ? alu_result_vm & bit_enable_mask : alu_result_vm;
            end else begin
                alu_result_vm = '0;
            end

We first get the result 'alu_result_vm' using the operand 'alu_operand_b_seq' which isn't masked and then mask the result to get 'alu_result_vm_m'. But actually we should mask the operand first and then get the result, according to the manual. In this wrong case, we first get result 0 0 0 0 0 0 1 1, and then mask it to get the final result 0 0 0 0 0 0 1 1(0x3)(where mask vector is 1 1 0 0 0 0 1 1).

mp-17 commented 4 days ago

Thanks a lot! I am refactoring the MASKU to solve the many bugs it has triggered. If you want, can you try the base branch of this PR and let me know if the bug is still there?

mp-17 commented 4 days ago

Okay, I have tried. The test should actually be:

  VSET(8, e8, m1);
  VLOAD_8(v3, 0x94, 0, 0, 0, 0, 0, 0, 0);
  VLOAD_8(v0, 0xC3, 0, 0, 0, 0, 0, 0, 0);
  VCLEAR(v2);
  asm volatile("vmsbf.m v2, v3, v0.t");
  VSET(1, e8, m1); // Different here
  VCMP_U8(17, v2, 0x43); // Different here

Since you are comparing only the first 8 bits for this operation (the last ones are prescribed as tail-agnostic by the specs).

This test passes with the new mask unit. I have added the test to the test suite, thank you! ;)

Feel free to re-open the issue if something is not clear.