orangecrab-fpga / orangecrab-hardware

ECP5 breakout board in a feather physical format
Other
477 stars 55 forks source link

DDR3/board gets hot when running an SoC using LiteDRAM. #19

Closed gregdavill closed 3 years ago

gregdavill commented 4 years ago

@pdp7 has commented on the prototype r0.2 board he's got:

the crab gets quite hot to the touch when running litex vexriscv... is that expected?

I've noticed this too with my testing, the SDRAM seems to run ~10-20°C above ambient. This can make for a very toasty board, although this shouldn't cause any part failures. I'm curious if this is just normal for DDR3L, or if LiteDRAM is keeping the bus in an active state during times of inactivity.

I will throw a unit under a thermal camera to see exactly where the heat is coming from, I suspect the DDR3/ECP5 I/O drivers, the memory has On-Device-Termination. Which pulls the DQ pins to VDD/2 via ~60Ohms. And also all the Address and Control lines have 50Ohm termination to VDD/2. I suspect this is what's heating up.

Lots of devices make use DDR memory, and they're not burning hot while idle, so maybe there is a way we can tri-state the bus while idle in LiteDRAM to avoid this power wastage/heat.

benvanik commented 4 years ago

Reading FPGA-TN-02032 section 4.11.9 it looks like the termination pull-up is dynamically disabled when a particular I/O is acting as a receiver and tri-stated: image

In theory then if the LiteDRAM controller can set the bus to tri-state things should get better.

gregdavill commented 4 years ago

Adding this here because it is related.

On @benvanik's suggestion I took the Vtt resistors off, and replaced the Vref LDO with a resistor divider. Everything still works and timing looks identical. I'll want to do actual SI measurements on the hardware samples before any new hardware changes are locked in for the next revision.

IMG_9883 IMG_9882

I'm still interested in finding out if there is any gateware changes that can aid with reducing power.

gregdavill commented 3 years ago

I've done some current measurements.

A stock OrangeCrab board running a 48MHz VexRiscv takes around 300mA at the USB connection (1.5W) Removing the VTT resistors saves 50mA (1.25W) Turning off ODT saves another 50mA (1W, When combined with removing VTT resistors)

tommythorn commented 3 years ago

Impressive drop. Do we know how much of that remaining 1W is consumed by the DRAM itself?

gregdavill commented 3 years ago

Somewhat strange, if I disable the virtual VCC/GND the current drops by another 60mA. Maybe they're wired up incorrectly...

Which puts us at 140mA, Over half of what we started with. I'm still not sure what the exact implications of disabling the ODT and virtual power pins.

Micron App notes suggests VTT and ODT can be pretty safely be disabled on systems that only have 1-2 DRAM chips. But I can hook a board upto my scope to look at how the eye is effected by these changes. Lattice recommends adding the virtual power pins on DQS groups to assist in reducing IO switching noise.

gregdavill commented 3 years ago

Once ODT is off, I'm pretty sure most of the power draw is the ECP5.

Here is what it looks like under thermal with all my changes so far. Screenshot_2020-07-19_19-29-56

The ECP5 here is at about 37C.

gregdavill commented 3 years ago

In combination to the above steps, setting DIFFRESISTOR=OFF on the DQ Strobe and TERMINATION=OFF on the DQ groups results in 72mA from the USB. 360mW!!

benvanik commented 3 years ago

Wow, that's impressive! I'm interested to see what the scope says!

gregdavill commented 3 years ago

This issues has been resolved with some more investigation: https://github.com/enjoy-digital/litedram/issues/216 LiteDRAM now has an attribute to disable ODT, and we need to assign all the bits in the virtual vccio pins.

I'll leave this open until I've fixed the example SoC.

gregdavill commented 3 years ago

Fixed in: https://github.com/gregdavill/OrangeCrab-examples/commit/724265e2f07b8e275beb11e865124c5a43d4e126 https://github.com/gregdavill/OrangeCrab-test-sw/commit/712ef2183bd9de65db1b64eaf57bd19eb8191829

tommythorn commented 3 years ago

If I understand correctly, you got the 360 mW reduction through both HW rework and litedram fixes. Are the litedram changes sufficient by themselves (for a r0.1/r0.2 board) or how much is left on the table?

gregdavill commented 3 years ago

I think most of the improvements have come from gateware changes.

But I can certainly run a comparison between r01, r0.2, r0.2.1. r0.2.1 does not have any VTT populated, so loading the same gateware onto these 3 subjects should be a good test.