simulation environment - Githubissues

darwin964 commented 3 years ago

Hi, I use irun, vcs-mx, modelsim try including the A2 code, but there are so much compile error, like (1) macro definition. (2) library call. Could you provide simulation environment and a simple testbench demo. And which simulation tools and which vision did you use? Thanks.

openpowerwtf commented 3 years ago

The IBM compilers and cycle simulator are not 'open' at this time. I used them and my Python driver for core+AXI sim, and Xilinx sim/synth tools for FPGA.

For either case, the 'testbench' is Power code (asm/gcc plus scripts to convert to mem loads, for either sim or JTAG-AXI load to BRAM).

darwin964 commented 3 years ago

@openpowerwtf Thanks, so I can only do firmware emulation on FPGA right now as IBM simulation tools are not "open"? And when will simulation tools open?

openpowerwtf commented 3 years ago

I believe Xilinx xvhdl/xelab/xsim can be used without an FPGA target. It should be trivial to create a core wrapper and use tcl/sv to simulate rtl.

What problems are you seeing in compile? If they are caused by the libraries, they should be fixable for all compilers.

I will add your request to the list for an open IBM toolchain 👍

darwin964 commented 3 years ago

@openpowerwtf Thanks for your reply.

When I use VCS mx, it shows "don't support string type array", also can't find some pkg in library(it maybe caused by compile order ).

Good news is we solved these errors by use Questasim, only change "is 1" to "is TURE".

I think if I use UVM in my environment build , xrun/irun/vcs mx maybe better, but result show these tools don't support vhdl well, or maybe there have something wrong in my library call ways.

xinyu8888 commented 3 years ago

@openpowerwtf Hi buddy, I saw your previous message indicating that you used asm/gcc plus scripts to convert to mem loads for JTAG-AXI load to BRAM. I noticed that in the bd diagram, there are two blk_mem_gen (#1 is 8M and #2 is 2M). I suppose they stand for L2 cache, right? We are planning to load our compiled instruction file to L2 cache on board to test the A2 core through JTAG-AXI. Could you please give me some guidance regarding how you loaded your mem loads to BRAM by using JTAG-AXI? Is there any relevant tcl script that you can provide for us? Thanks!

openpowerwtf commented 3 years ago

@xinyu8888 Sounds like a plan!

Yes, the two memories are just a substitute for having DDR memory for now. It originally made it easier to incrementally bring up the design (they were preloaded with .coe files in the first minimal design). Keeping two small ones made it easier to use one for low-mem (kernel) and the other for code/data extracted from ELF, but definitely not necessary. They could be changed to URAMs.

There are tcl files in rel/fpga. Procs raxi/waxi read and write from AXI. There are also some that access the VIO ports.

raxi 00000A80 16
     a80  00f00280 00000000 
     a88  00000000 00ff0310 
     a90  00000000 b0040010 
     a98  00000000 00000310 
     aa0  00000000 ff1f0310 
     aa8  00000000 00000310 
     ab0  00000000 66410200 
     ab8  00000000 00000000 

waxi 00000A94 F0050010 1  ;# t0 entry
     a94  f0050010 

ascii 10030000 10000
T0 [Inky].....kickstart my heart!
T0: Loop done. Ticks = 4259 (0.000043 seconds).
T0: Done. rc=0

I wrote some bash/python to convert objdump's to addr/data formats and eventually to tcl scripts to load FPGA memories - a bunch of commands like:

# 00000F00:00000F7F
waxi 00000F00 { 6400004C 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 A602697C BDFCFF4B A603697C A603087C 00006038 6400004C 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000} 32

Sourcing the image tcl files loads the memories. The procs are also useful for reading/writing status/config after the initial loads.

xinyu8888 commented 3 years ago

@openpowerwtf Thanks again for the excellent comment! We tried the tcl scripts you mentioned above and they worked really well! We are able to execute some jump instructions on board and capture the relevant signals on ILA. One problem we are seeing here is when we tried to write to big address such as EFFFFFF0 (waxi EFFFFFF0 F0050010 1), it didn't succeed. But when we tried to write to small address such as 00000FF0, it succeeded. Is this problem related to the tcl script? Thanks!

xinyu8888 commented 3 years ago

@openpowerwtf Also, I remember you mentioned in another comment saying that “if you haven't loaded bram, core fetch to 000000E0 repeatedly”. May I know why is that? The thing is that we already wrote a small amount of data to the BRAM in the way as I mentioned in the last comment. Do you consider this as "loaded bram"? Because we are seeing this repeated 000000E0 000000F0 00000010 pattern after we wrote to BRAM. Thanks!

openpowerwtf commented 3 years ago

@xinyu8888 big address fail: Do you have an AXI slave mapped to that address?

E0: Check the user guide - that is an interrupt vector (program interrupt). One of the reasons for that interrupt is illegal op. 00000000 is an illegal op, so if BRAM isn't loaded, the branch to 0 fetched after POR will then fetch an illegal op. Core will vector to E0, but there will be 00000000 there also, so it repeatedly fetches E0.

So the easiest first test is something like this:

0000 48000400 b 400
00E0 48000000 b .
0400 48000000 b .

The core will get stuck at @400 if successful, else most likely end up at @E0. Make sure your byte ordering is correct in BRAM.

xinyu8888 commented 3 years ago

@openpowerwtf Hi buddy! Thank you for your great help. We are able to execute a few instructions with single A2 core on both FPGA and modelsim simulation. We are currently planning to make further move: running double A2 core with operating system. In order to achieve this goal, there are a few things and some help from you that we definitely need. I list them below:

Could you please provide us with the A2 core bootloader source code and the relevant documents regarding how to boot the system, if you have?
Can we have the Linux kernel source core that is compatible with A2 core and the source code of headfile?
What kind of steps should we take to achieve running multi A2 cores with L2 cache and operating system? Is there any related document? Shall we pay attention to some specific things that might screw up the design?
Do you have the L2 cache source code with cache coherency? We are trying to achieve L2 cache coherency with snooping protocol in our "double A2 core single L2 cache" design. Can we just use the A2L2 interface document as a reference and create double A2L2 interface (or single A2L2 interface?) to connect double A2 core and single L2 cache? If we do so, we want to get rid of all the other block modules that come after L2 cache (e.g axi_smc, a2x_axi_reg, axi_bram_ctrl, blk_mem_gen) to facilitate our design work. Do you think this is doable? Thanks!

openpowerwtf commented 3 years ago

@xinyu8888 That's cool! If you've run a few instructions, then you are hitting the address spaces and getting the correct responses.

I wrote a VERY minimal boot kernel - only enough to allow some preliminary testing on the board design. It uses ERAT-only translation instead of full MMU and most of the interrupt handlers aren't implemented, so not sure it's very useful. The boot process is described in 4.4 of the UM.
Not sure if/where one exists, though a kernel for Power 2.06/Book 3E will be compatible.
For a 'real' system, you will need to implement everything required for Power ISA, as expected by A2I, in addition to the normal coherency protocol, etc. The L2 will participate in sync, larx/stcx, core cache back-invalidates, etc. This is a significant undertaking and will require knowledge of the ISA and how to validate the logic.
We don't have a multicore L2 available yet, and it's unknown if there will be one released. Yes, the A2L2 interface document should be accurate. You will need one per core at some level (not a shared bus). You can add a directory and data cache, and if that is your memory, you don't need the AXI bus or slaves (well, someone needs to respond to the initial boot address, but that can be changed to a different address). As mentioned above, if you want to do fun stuff like locking and ordering, you will have to understand the implications, and do the implementation as required by Power/A2I. So if your environment can handle two cores and some version of L2, you can definitely use that as a base design for booting and running multicore.

You could also consider an intermediate step, depending on what you're trying to do. You could do an L2 with one interface, with the core running SMT2-4. I think you would still have to implement most/all the A2L2 TTypes, but could avoid some of the extra size/complexity of multiple interfaces. The hardware would appear as a 'multiprocessor' to most of the code. You could leave hooks for intra-L2 core-core for next step. But this model isn't as good if you are interested in multicore cache protocol.

Either way is simulatable, possibly implementable on a single FPGA, or implementable on multiple FGPA with interface logic to talk FPGA-FPGA. Good luck!

xinyu8888 commented 3 years ago

@openpowerwtf Thanks a lot! This plan indeed involves a lot of extra work. We will try to implement it step by step. My colleague really wants to have the VERY minimal boot kernel file from you. They said even if it's minimal, it's still gonna be really helpful for them. Also, do you have any other L2 related documents other than the A2L2 interface one? For example, documents that explain the detailed architecture and layout of A2-core-compatible-L2-cache module by module would be great for us, plus the documents related to cache coherency for A2 core. I know we are asking a lot, but we really really need them :) Thanks!

darwin964 commented 3 years ago

@openpowerwtf Hi, Could you please provide the "L2 user manual"？

sharkcz commented 3 years ago

Aren't those in the rel/doc directory?

darwin964 commented 3 years ago

Aren't those in the rel/doc directory?

no, only has these pdf files: A2L2.pdf, A2_BGO.pdf BlueGene-IBMBQC.pdf

openpowerwtf commented 3 years ago

@darwin964 What do you mean by 'L2 user manual'? There is no 'general implementation guidelines' manual. Since Power ISA doesn't specify many implementation details, there isn't a standard core-L2 interface, and the core+L2 design for a specific system will usually share responsibilities for coherency/translation/atomicity/etc.

openpowerwtf commented 3 years ago

@xinyu8888 There is now an a2-boot repo with some simple asm code showing ERAT init, etc.

openpower-cores / a2i

simulation environment #8