openpower-cores / a2i

Other
243 stars 40 forks source link

CoreMark and Simulation #24

Closed xinyu8888 closed 3 years ago

xinyu8888 commented 3 years ago

@openpowerwtf Hi buddy, has your team already used CoreMark to test the functionality of the A2 processor core? If the answer is yes, may I know what the test result is? We are planning to test A2 core on FPGA by using CoreMark and UART, do you have any suggestion for this plan? We want to output CoreMark test result from our FPGA board to PC through UART cable and axi_uart IP, also do the same for simulation. While we are doing the simulation, we encountered a few issues: we wrote our own boot code based on the simple a2_boot code provided by you to test the uart IP both on board and on simulation. In simulation, after modelsim run for a while, we got a fatal error shown in the attachments. In order to solve this issue, we changed line 1025 in a2l2_axi.vhdl from shl_ordered <= rotl(req_p1_any_shl, stq_head_q) to shl_ordered <= rotl(req_p1_any_shl, stq_head_q(2 to 3)) and line 1027 from shl_youngest <= rotr(shl_ordered_youngest, stq_head_q) to shl_youngest <= rotr(shl_ordered_youngest, stq_head_q(2 to 3)) and it fixed the issue. Do you think this change might cause some other issues for the design? Is it normal to encounter this fatal error in simulation? In your opinion, under what kind of circumstances this simulation fatal error would be caused and is there way to prevent it from happening? Thx! 1 2

xinyu8888 commented 3 years ago

@openpowerwtf Before we want to do the simulation, we have to regenerate the simnetlist file from the IP block in Vivado (e.g axi_smc, blk_mem_gen, axi_uart) if we changed the address space or BRAM content. This is causing some inconvenience for us, because every time we want to create a new simnetlist file from a certain IP, we have to create a new project and load the corresponding .xci file, then right click the Generate Output Product since we cannot generate the simnetlist file directly from the IP blocks in the original bd diagram. Is there a better and simpler way to generate the corresponding IP files for simulation for this project? Thx!

openpowerwtf commented 3 years ago

Your fix is wrong. The function is just implementing a rotate-left of a vector by a count and it doesn't expect the rotate count to be more than the vector size. I would say there is a bug in a2l2_axi.vhdl:

The lhs_ordered/lhs_youngest and shl_ordered/shl_youngest are using the wrong queue's pointers (lhs should use stq_head_q and shl should use ldq_head_q). I think that would explain your sim error, since the stq is bigger than the ldq.

I didn't get that error in my sim (unfortunately!) and didn't do enough testing where load-store collisions would functionally matter. a2l2_axi.vhdl and a2x_pkg.vhdl definitely need more testing - good job!

openpowerwtf commented 3 years ago

Before we want to do the simulation, we have to regenerate the simnetlist file from the IP block in Vivado (e.g axi_smc, blk_mem_gen, axi_uart) if we changed the address space or BRAM content.

No, not that I know of. Changing address space would be changing the logic in the IP. And if you are using BRAM init files, I think it's the same case.

You could avoid the addressing problem by:

  1. Map a big address space (OK to have 'holes' and just partially implemented BRAM range(s)).
  2. Use the ERATs! You can change your EA to RA translation(s) to hit your implemented range(s).

Some ways you could avoid the BRAM init problem by doing the preload without the init file (use the BRAM as RAM):

  1. Use tcl to load the BRAMs directly (internally) before starting sim
  2. Use tcl to drive JTAG-AXI, or AXI bus itself, to do bus writes to BRAM
  3. Write some simple logic block that you can easily control with tcl (address/data bus in and some control lines), and have it do bus writes to BRAM
Grubby-CPU commented 3 years ago

Your fix is wrong. The function is just implementing a rotate-left of a vector by a count and it doesn't expect the rotate count to be more than the vector size. I would say there is a bug in a2l2_axi.vhdl:

The lhs_ordered/lhs_youngest and shl_ordered/shl_youngest are using the wrong queue's pointers (lhs should use stq_head_q and shl should use ldq_head_q). I think that would explain your sim error, since the stq is bigger than the ldq.

I didn't get that error in my sim (unfortunately!) and didn't do enough testing where load-store collisions would functionally matter. a2l2_axi.vhdl and a2x_pkg.vhdl definitely need more testing - good job!

Hi, the line 1025 in a2l2_axi.vhdl and the related signals are listed below.

signal shl_ordered : std_logic_vector(0 to ld_queue_size-1);     
signal req_p1_any_shl: std_logic_vector(0 to ld_queue_size-1);
signal stq_head_q: std_logic_vector(0 to clog2(st_queue_size)-1);   

ld_queue_size                        : integer := 4;    
st_queue_size                        : integer := 16;

shl_ordered <= rotl(req_p1_any_shl, stq_head_q);

In my opinion, the size of shl_ordered, req_p1_any_shl and stq_head_q are actually the same. In other words, there should not cause errors in the rotl function.

openpowerwtf commented 3 years ago

Right. But the count is encoded, and the vector being shifted is a bit vector. Not looking at logic right now but I think the function expects that.

On Wed, Sep 9, 2020, 11:54 PM Grubby-CPU notifications@github.com wrote:

Your fix is wrong. The function is just implementing a rotate-left of a vector by a count and it doesn't expect the rotate count to be more than the vector size. I would say there is a bug in a2l2_axi.vhdl:

The lhs_ordered/lhs_youngest and shl_ordered/shl_youngest are using the wrong queue's pointers (lhs should use stq_head_q and shl should use ldq_head_q). I think that would explain your sim error, since the stq is bigger than the ldq.

I didn't get that error in my sim (unfortunately!) and didn't do enough testing where load-store collisions would functionally matter. a2l2_axi.vhdl and a2x_pkg.vhdl definitely need more testing - good job!

Hi, the line 1025 in a2l2_axi.vhdl and the related signals are listed below.

signal shl_ordered : std_logic_vector(0 to ld_queue_size-1); signal req_p1_any_shl: std_logic_vector(0 to ld_queue_size-1); signal stq_head_q: std_logic_vector(0 to clog2(st_queue_size)-1);

ld_queue_size : integer := 4; st_queue_size : integer := 16;

shl_ordered <= rotl(req_p1_any_shl, stq_head_q);

In my opinion, the size of shl_ordered, req_p1_any_shl and stq_head_q are actually the same. In other words, there should not cause errors in the rotl function.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/openpower-cores/a2i/issues/24#issuecomment-689980244, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMSSHJWA2RVFAEJEB6KJT5LSFBLW3ANCNFSM4Q5ZJDPA .

xinyu8888 commented 3 years ago

@openpowerwtf Hi buddy, I asked a question regarding the test result of CoreMark on a2i about 20days ago but I didn't get an answer. I'm just wondering have you already used CoreMark to test the functionality of a2i? If the answer is yes, may I know the result? Thanks.

xinyu8888 commented 3 years ago

@openpowerwtf We still have a few more questions regarding L1 cache:

  1. The default size of D-cache is 16kb, is it possible to change it to 32kb?
  2. If the answer of Q1 is yes, what kind of parameters do we need to modify in order to adapt to this change? We got some errors when we tried to change the value of dc_size signal from 14 to 15 in acq_soft.vhdl file.
  3. If we can modify the size of D-cache, do we have to adjust the size of I-cache accordingly as well? If yes, what kind of parameters do we need to change in this case? Thx!
openpowerwtf commented 3 years ago

I wouldn't be surprised if changing parameter values led to syntax or functional bugs. dc_size may have been changed at some point to its current value, or was put in at the start and never changed. If parameter changes weren't continuously checked during regression tests (which would require building new models for every parameter change), any logic changes along the way could break something that once worked OK. I don't think there's a solution besides debugging syntax errors, then doing some functional testing to make sure nothing breaks and the parameter has the expected effects.

There's no architectural requirement to keep the I/D caches the same size. And for most implementations the relevant logic would be unique. But I didn't see an ic_size parameter, so the source needs to change similarly to the DC (once that is working 😀).

zwm152 commented 3 years ago

Hi, I have read the manual for dc_size 16KB. If you have the manual for dc_size 32KB, please send it to me for my comparsion among them.

I wouldn't be surprised if changing parameter values led to syntax or functional bugs. dc_size may have been changed at some point to its current value, or was put in at the start and never changed. If parameter changes weren't continuously checked during regression tests (which would require building new models for every parameter change), any logic changes along the way could break something that once worked OK. I don't think there's a solution besides debugging syntax errors, then doing some functional testing to make sure nothing breaks and the parameter has the expected effects.

There's no architectural requirement to keep the I/D caches the same size. And for most implementations the relevant logic would be unique. But I didn't see an ic_size parameter, so the source needs to change similarly to the DC (once that is working 😀).

openpowerwtf commented 3 years ago

@zwm152 No, I don't have any more info (written or otherwise) on that parameter. I've never tried changing it so I don't know what errors it's causing.

openpowerwtf commented 3 years ago

I have created a pull request that fixes the bug listed above and begins merging A2I/A2O versions of A2L2_AXI. They are 'close' now except for some minor differences in A2I vs A2O signals on the A2L2 bus, and a debug feature to allow limiting outstanding AXI loads and stores.

There is a new repo for code samples, build scripts, etc.

@xinyu8888 I did run coremark a while ago to try my memory build scripts. I think I saw about 11K iterations/sec for SMT4 scaled to 4GHz.