zf3 / psram-tang-nano-9k

An open source PSRAM/HyperRAM controller for Sipeed Tang Nano 9K / Gowin GW1NR-LV9QN88PC6/15 FPGA
Apache License 2.0
59 stars 8 forks source link

How to drive PSRAM clock LVDS signaled? #1

Closed juj closed 2 years ago

juj commented 2 years ago

I got my exposure to Sipeed Tang Nano/Gowin PSRAM/HyperRAM first from this blog entry: https://justanotherelectronicsblog.com/?p=986

which links to this data sheet: https://www.winbond.com/resource-files/W956x8MBYA_64Mb_HyperBus_pSRAM_TFBGA24_datasheet_A01-003_20200724.pdf

The data sheet mentions both forms Differential Clock and Single Ended Clock, where it states:

image

Farther down in the document, it is stated that which one is used, is actually configurable via a register write:

image

That sounds interesting.

I find that in this repository, clock is not driven differentially signaled. There is a wire O_psram_ck_n defined as an input:

https://github.com/zf3/psram-tang-nano-9k/blob/aa05c7b999f1eb298f225fe180e99720769c8755/src/psram_test_top.v#L13-L14

but it is not referenced, so it will be swept in optimizing(?)

In the example code from justanotherelectronicsblog.com 's PSRAM controller, they also instantiate O_psram_ck_n:

https://github.com/riktw/tang4Kramblings/blob/215c2cea2204876a2993cfdfc43be37ccd4df9f7/NEORV32_HyperRAM/src/neorv32_test_setup_bootloader.vhd#L59-L64

but then drive it to constant low: https://github.com/riktw/tang4Kramblings/blob/215c2cea2204876a2993cfdfc43be37ccd4df9f7/NEORV32_HyperRAM/src/neorv32_test_setup_bootloader.vhd#L238

I have been trying to hack together a PSRAM test to Tang Nano 4K and 9K. In my test so far, I have got some serious stability issues if I try to increase the signaling clock speed beyond a few dozen MHz, and I wonder if it might be due to my buggy&hacky code, or due to some actual physical effects that would require Differential signaling to be enabled for the clock line.

I wonder if you have considered attempting to enable differential signaling for the clock line, or do you know if it makes any effect?

I also see that you have the project currently configured to run at 40.5 MHz - I'd be curious to know how far you have been able to push this speed?

Finally, one question: in this repo, I see you instantiate the HyperRAM IO via

https://github.com/zf3/psram-tang-nano-9k/blob/aa05c7b999f1eb298f225fe180e99720769c8755/src/psram_test_top.v#L13-L18

In my test code, which I think I have derived justanotherelectronicsblog.com, I instantiate it as follows:

  output [0:0] O_hpram_ck,      // HyperRAM Clock signal, ticks DDR (at both rising and falling edges)
  output [0:0] O_hpram_ck_n,    // Differential negative pair to O_hpram_ck signal. HyperRAM actually is configurable via a reg write whether to use differential signaling, and is disabled by default at boot.
  output [0:0] O_hpram_cs_n,    // Chip Select, active low.
  output [0:0] O_hpram_reset_n, // Reset, active low.
  inout  [7:0] IO_hpram_dq,     // 8-bit wide data path for CA (command & address) sends and data reads
  inout  [0:0] IO_hpram_rwds,   // Read-Write Data Strobe. Multiple purposes, see above.

I am not sure I understand the difference between these two. Since the DDR addressing clocks dq in and out every rising and falling edge, it seems that your code attempts to read/write 32-bits of data for every period of a rising+falling edge, whereas this code would only manage 16-bits per period. Have you tested if Tang Nano can actually write 32-bits of data for every rising+falling edge?

Thanks for publishing a really nice example code!

zf3 commented 2 years ago

You are right that the clock is not driven differentially, as it is optional and could be driven "single-ended". For example, this datasheet mentioned it.

Regarding speed, the current clock is 81Mhz, as the memory runs with the same clock as the logic. DDR primitives (ODDR/IDDR) are used to handle double data rate transfers. According to datasheet, frequencies above 83Mhz need more delay cycles and thus changes to the config registers.

The I/O ports are different because the chip has two memory dies (4M bytes each). O_psram_ck[1] and etc are for the second die. My code only uses the first die although the interface defines two dies.

Timing is quite tricky with HyperRAM, especially with the phase-shifted clock (clk_p). What I did was output the signals (clk, clk_p, rwds, dq etc) through debug pins and observe them with a logic analyzer and then compare with requirements from the datasheet. Good luck with your project.

juj commented 2 years ago

Great, now I follow. Thanks!