zangman / de10-nano

Absolute beginner's guide to the de10-nano
Apache License 2.0
180 stars 43 forks source link

FPGA write HPS SDRAM #10

Open HCIMaker opened 2 years ago

HCIMaker commented 2 years ago

Hi zangman:

This is an awesome project and wiki! I appreciate your detailed instruction. I wonder whether there is a reversing version of using SDRAM like FPGA directly write the data to SDRAM on HPS side? Thank you very much!

zangman commented 2 years ago

@HCIMaker Thanks for your kind words :)

I don't have a section on writing to the SDRAM, but I think it should be doable. You will need to follow a state machine approach as explained here.

I highly recommend going through the content on this page first and watching this video.

Hope that helps and good luck!

HCIMaker commented 2 years ago

@HCIMaker Thanks for your kind words :)

I don't have a section on writing to the SDRAM, but I think it should be doable. You will need to follow a state machine approach as explained here.

I highly recommend going through the content on this page first and watching this video.

Hope that helps and good luck!

Thank you zangman! I will try it out!

vrbadev commented 1 year ago

Hello,

I am currently trying that, it seems the FPGA manages to write to the SDRAM. I slightly extended the example SystemVerilog component for QSYS. I am able to watch how the memory changes every second from HPS using memtool, but for some reason the address is still incrementing (by data width, 0x20 = 32B) even it should remain at the start address 0x2000_0000.

The SystemVerilog file looks like this:

module sdram_if # (
  parameter ADDR_SIZE = 32,
  parameter DATA_SIZE = 256 )
  ( clk, reset,
    avm_m0_read, avm_m0_write, avm_m0_writedata, avm_m0_address, avm_m0_readdata, avm_m0_readdatavalid, avm_m0_byteenable, avm_m0_waitrequest, avm_m0_burstcount,
     address, byteenable, read, data_out, write, data_in, busy );

// clk and reset are always required.
input   logic         clk;
input   logic         reset;
// Avalon Master ports
output  logic                       avm_m0_read;
output  logic                       avm_m0_write;
output  logic [DATA_SIZE-1:0]   avm_m0_writedata;
output  logic [ADDR_SIZE-1:0]   avm_m0_address;
input   logic [DATA_SIZE-1:0]   avm_m0_readdata;
input   logic                       avm_m0_readdatavalid;
output  logic [(DATA_SIZE/8)-1:0]avm_m0_byteenable;
input   logic                       avm_m0_waitrequest;
output  logic [10:0]                avm_m0_burstcount;
// External conduit
input   logic [ADDR_SIZE-1:0]       address;
input   logic [(DATA_SIZE/8)-1:0]   byteenable;
input   logic                           read;
output  logic [DATA_SIZE-1:0]       data_out;
input   logic                           write;
input   logic [DATA_SIZE-1:0]       data_in;
output  logic                           busy;

localparam INIT = 3'd0;
localparam READ_START = 3'd1;
localparam READ_END = 3'd2;
localparam WRITE_START = 3'd3;
localparam WRITE_END = 3'd4;

logic [2:0] cur_state;
logic [2:0] next_state;

logic [ADDR_SIZE-1:0] addr;
logic [DATA_SIZE-1:0] data;
logic [(DATA_SIZE/8)-1:0] enable;

// Handling change of the current state to the next requested state
always_ff @(posedge clk) begin
  if (reset) begin
        cur_state <= INIT;
  end else begin 
        cur_state <= next_state;

      if (read) begin
            addr <= address;
            enable <= byteenable;
        end else begin
            if (write) begin
                addr <= address;
                enable <= byteenable;
                data <= data_in;
            end
        end
  end
end

// Handling FSM transitions
always_comb begin
  next_state = cur_state;
  busy <= '0;
  case(cur_state)
    INIT: begin
      if (read) begin
            next_state = READ_START;
        end else begin
            if (write) begin
                next_state = WRITE_START;
            end
        end
    end

    READ_START: begin
       busy <= '1;
      if (avm_m0_waitrequest) next_state = READ_START; // Wait here.
      else next_state = READ_END;
    end

    READ_END: begin
       busy <= '1;
      if (!avm_m0_readdatavalid) next_state = READ_END; // Wait here.
      else next_state = INIT;
    end

    WRITE_START: begin
       busy <= '1;
      if (avm_m0_waitrequest) next_state = WRITE_START; // Wait here.
      else next_state = WRITE_END;
    end

    WRITE_END: begin
       busy <= '1;
      next_state = INIT;
    end

    default: begin
      next_state = INIT;
    end
  endcase
end

// Handling read and write start of each transaction
always_comb begin
  avm_m0_address = '0;
  avm_m0_read = '0;
  avm_m0_write = '0;
  avm_m0_byteenable = '0;
  avm_m0_burstcount = '0;
  avm_m0_writedata = '0;

  case(cur_state)

    READ_START: begin
      avm_m0_address <= addr;
      avm_m0_read = '1;
      avm_m0_byteenable <= enable;
      avm_m0_burstcount = '1;
    end

    WRITE_START: begin
      avm_m0_address <= addr;
      avm_m0_write = '1;
      avm_m0_writedata <= data;
      avm_m0_byteenable <= enable;
      avm_m0_burstcount = '1;
    end

    default: begin
    end
  endcase
end

// Handling read and write end of each transaction
always_ff @(posedge clk) begin
  if (reset) begin
       data_out <= '0;
  end else begin
    case (cur_state)

      READ_END: begin
        if (avm_m0_readdatavalid) begin
          data_out <= avm_m0_readdata;
        end
      end

      default: begin
      end
    endcase
  end
end

endmodule

And my VHDL entity for testing looks like this:

LIBRARY ieee;
USE ieee.std_logic_1164.ALL;
USE ieee.numeric_std.ALL;

ENTITY test_sdram IS
    PORT (
        clock : IN STD_LOGIC := '1';
        nrst : IN STD_LOGIC := '1';
        h2f_nrst : IN STD_LOGIC := '1';
        sdram_address : OUT STD_LOGIC_VECTOR(31 DOWNTO 0) := (OTHERS => '0');
        sdram_byteenable : OUT STD_LOGIC_VECTOR(31 DOWNTO 0) := (OTHERS => '0');
        sdram_read : OUT STD_LOGIC := '0';
        sdram_data_read : IN STD_LOGIC_VECTOR(255 DOWNTO 0);
        sdram_write : OUT STD_LOGIC := '0';
        sdram_data_write : OUT STD_LOGIC_VECTOR(255 DOWNTO 0) := (OTHERS => '0');
        sdram_busy : IN STD_LOGIC;
        led : OUT STD_LOGIC_VECTOR(7 DOWNTO 0)
    );
END ENTITY;

ARCHITECTURE arch OF test_sdram IS
    SIGNAL i_led : STD_LOGIC_VECTOR(7 DOWNTO 0) := (OTHERS => '0');
BEGIN
    count : PROCESS (nrst, clock) IS
        VARIABLE counter : INTEGER := 0;
        VARIABLE ticks : INTEGER := 0;

        CONSTANT START_ADDR : STD_LOGIC_VECTOR(31 DOWNTO 0) := STD_LOGIC_VECTOR(to_unsigned(16#2000_0000#, 32));
        CONSTANT BYTE_ENABLE : STD_LOGIC_VECTOR(31 DOWNTO 0) := (OTHERS => '1');
    BEGIN
        IF nrst = '0' OR h2f_nrst = '0' THEN
            sdram_data_write <= (OTHERS => '0');
            sdram_address <= (OTHERS => '0');
            sdram_byteenable <= (OTHERS => '0');
            sdram_read <= '0';
            sdram_write <= '0';
            i_led <= (OTHERS => '0');
            counter := 0;
            ticks := 0;
        ELSE
            IF rising_edge(clock) THEN
                IF counter = 50_000_000 THEN
                    counter := 0;
                    ticks := ticks + 1;
                    sdram_address <= START_ADDR;
                    sdram_data_write(31 DOWNTO 0) <= STD_LOGIC_VECTOR(to_unsigned(ticks, 32));
                    sdram_byteenable <= BYTE_ENABLE;
                    sdram_write <= '1';
                    i_led(1) <= NOT i_led(1);
                ELSE
                    counter := counter + 1;
                    sdram_read <= '0';
                    sdram_write <= '0';
                END IF;
            END IF;
        END IF;
    END PROCESS;

    led <= i_led;
END ARCHITECTURE;

In the attached image I highlighted the address when I released the reset button - the counter restarts but the address keeps incrementing. Also, the address rolls back to 0x2000_0000 after 1024 write transactions (every 65536B), I am also not sure why. memtool_sdram

Can you please help me elaborate on this? Thanks!

vrbadev commented 1 year ago

So I have already figured it out - without custom QSYS component. I use External Bus to Avalon Bridge with Address Span Extender (otherwise the External Bus' max address range 0x0000_0000 - 0x3fff_ffff (1GB) can't match the f2h_sdram0_data range 0x0000_0000 - 0xffff_ffff). Nice thing is the Address Span Extender also supports address offset of 0x2000_0000 so the data can be addressed from FPGA starting at 0).

If anyone is curious how it is configured in QSYS (I disabled irrelevant components): qsys_working_config qsys_addr_span_ext qsys_ext_bus_avalon_br

As input clock to the memory-related components I use HPS output clock running at 400 MHz. The External Bus is limited to 128-bit data width but it is not a problem for my application. Also I use SW generated reset signal from HPS PIO output (the bit is set using memtool in a systemd service after boot) because it seems FPGA must not write to the SDRAM before Linux boots up.

Andy2No commented 1 year ago

@vrbadev Cool. I understand very little of that code, but I aspire to being able to do similar things, one day.

What would the code for the HPS side look like, e.g. in C or C++? I take it you have to declare an array, or allocate some memory. Do you get to say where that is, in the address space, or do you allocate it then pass the base address to the FPGA?

vrbadev commented 1 year ago

The HPS doesn't have to allocate anything. In fact the HPS must avoid conflict when accessing the part of the SDRAM accessed by the fabric, so the parameter mem=512M in extlinux.conf must be defined so the OS doesn't use the consecutive part of the RAM (0x2000_0000+) at all as it would for its processes etc. (See the SDRAM tutorial) So the code at the HPS side is mainly mmap of the FPGA accessed memory space, if you write to it from HPS then the fabric must ensure there will be no conflicts.

By conflicts I mean concurrent read/write operations on the memory - it is unlikely but still possible. As far as I know you can initialize up to six f2sdram interfaces in Cyclone V and when all of them + the HPS try to read/write the memory at the same moment (the same rising edge of the clock), I am not sure what happens next. Probably it is an undefined behaviour like in the case of the BlockRAM. Then I would suggest to add an additional access-control entity, maybe with a FIFO for memory access requests.

Andy2No commented 1 year ago

@vrbadev Thanks. Yes, I can see that would be a problem. I was picturing a circular buffer that's written by one side (FPGA or HPS) and read by the other, with also a pointer or index being written, to say where it's got up to. If it's not safe for one side to read a location while the other side is writing to it, then it gets more complicated - even to read the pointer that's set by the other side.

The sort of application I had in mind involves an ADC or two and one or more DACs. Since FPGAs can have cycle perfect timing, which ARM cores aren't so good at, it seems best to do that part on the FPGA, but have circular buffers to exchange data with the HPS, which might do some processing of it before passing it back.

I guess it doesn't matter much which side the RAM buffers belong to, provided both sides can have the access they need to them, but I was thinking of a continuous process, not, for example, the FPGA reading some data into a buffer, setting a flag, then waiting for the HPS to act on it.

Maybe there could be some dual ported RAM blocks used on the FPGA, handling data in real time, and a separate mechanism to transfer between there and the HPS RAM, in blocks, using hand shaking.

vrbadev commented 1 year ago

@Andy2No Well, depending on your requirements, you may prefer STM32G4 MCUs over FPGAs - these MCUs have rich analog peripherals (multiple ADCs and DACs) which can be served using internal DMA, so the timing can be handled completely just by the built-in hardware. Also you will have no further trouble with compiling the OS and with the bootloader, your solution could be completely bare-metal. The price is also much lower and Nucleo boards are a good starting point for development, with hundreds of examples available online. Also, these chips are much easier to implement and to solder on custom PCBs (because of BGA packages so 4-layer PCBs are required for most of FPGA ICs).

Cyclone V is pretty complicated for beginners and may be an overkill for your application. I need it mainly for real-time processing of a video stream from a CMOS chip which would be impossible without the FPGA part.

Of course there is a dual-port BRAM inside the fabric available so you are right about the idea of the memory transfer mechanism.

Andy2No commented 1 year ago

@vrbadev My main aim was to learn more about programming FPGAs, and make something worthwhile but, yes, if I learned more about DMA on STM32s, I could just do it all on one of those. Fair point. Perhaps I should try to think of a project that's less suitable for doing on a microcontroller. Really, the point was just to gain some experience with FPGAs though.

Doing it all on the FPGA plus the 512MB or SDRAM that it can see, with no involvement of the Arm CPU, would also be acceptable, and might be a better placer to start.

lmz0528 commented 10 months ago

I am currently attempting a similar task. When I press the KEY button on the DE10-nano board, a write command will be sent to the SDRAM controller. The entire Avalon Master structure is very simple, as shown in the diagram below:

image

The strange thing is that when I send a write command, the SDRAM controller immediately raises the waitrequest signal to a high level. According to the Avalon protocol, the Master must keep the data unchanged, so the Master will endlessly wait since the waitrequest signal from the SDRAM controller will never return to a low level. When I ignore the waitrequest signal and change the write address and data according to the Master's clock cycle, I found that the SDRAM controller is actually writing data normally. The write cycle is 6 Master clock cycles. This has left me very puzzled. Has anyone encountered the same situation? Any insights would be greatly appreciated!