memtest86plus / memtest86plus

Official repo for Memtest86+
https://memtest.org
GNU General Public License v2.0
1.02k stars 79 forks source link

32GB LRDIMMs super slow in DELL T5610 #346

Open ThomasWaldmann opened 11 months ago

ThomasWaldmann commented 11 months ago

memtest86+ v6.20 release, iso booted via ventoy.

DELL T5610 2x Xeon E5-2667 v2

Have recently bought eight 32GB LRDIMMs and put them into a dual xeon workstation, 4 LRDIMMs on each of the CPUs.

Then ran memtest on it and got a super slow memory transfer rate display: 747 MB/s

DELL didn't talk about LRDIMMs in their specs, but I just tried it after seeing a youtube video with a successful installation of almost identical LRDIMM modules (manufacturer and type) in a DELL T3610 (single socket, same generation).

In the BIOS, it showed the full 256GB and also displayed ECC LRDIMM 1867MHz, so I thought it would work.

Then had memtest running for about 50h and the first pass was about half finished when I cancelled it. Somehow produced quite a lot of heat, but was slow. The LRDIMMs were hot when I removed them (too hot to touch longer, but definitely below 100 deg celsius).

DELL T3610 1x Xeon E5-2697 v2

Then I put the same eight LRDIMMs into another machine (all LRDIMMs on one CPU) and the same memtest version showed 12.5GB/s, which I think is the normal performance.

Huh, any idea?

Any idea what the issue could be in the T5610?

Does memtest86+ reprogram memory timings by itself or does it just use what the BIOS/firmware configured?

BTW, thanks very much for developing / maintaining mt86+ - especially the recent work to be bootable via UEFI made it more useful and simpler to use again!

debrouxl commented 11 months ago

Thanks for the report.

Nope, memtest86+ does not attempt to enable CPU turbo mode or change memory timings.

Are you able to run commands such as numactl -H, usually after installing the appropriate package often named numactl, under a Linux environment ? Chances are that there's something equivalent for a Windows environment, but I don't know about it :) What I'm aiming at is determining which cores and memory belong to which NUMA nodes; after that, in the initial memtest86+ configuration menu, you'll be able to accurately select half of the RAM and half of the cores from that 2S E5-2667 v2 T5610 workstation, and see whether it makes a difference in memory testing performance. Usually, cores from a given socket are either consecutive (for you, 0-7 and 8-15) or interleaved (0, 2, 4, 6, 8, 10, 12, 14 and 1, 3, 5, 7, 9, 11, 13, 15) in the enumeration. The BIOS of the NUMA machines I know of makes it possible to disable NUMA ("enable memory interleaving" or something like that); performing tests in those conditions could be a complementary measure.

ThomasWaldmann commented 11 months ago

Maybe can do that later (have Ubuntu 22.04 on that machine), currently have the DIMMs in that other T3610 machine at another place.

But yeah, guess if memtest would test a DIMM from a CPU core it is not attached to, that would explain some slowness (wondering: is it that much? more than 10x slower if "indirect"?).

I'll check the BIOS, but don't remember seeing any NUMA related settings.

debrouxl commented 11 months ago

In another discussion, Sam once posted a link showing terrible inter-socket bandwidth on a Haswell server with some NUMA settings: https://github.com/memtest86plus/memtest86plus/discussions/79#discussioncomment-2799789 .

If selecting only half of the RAM and half of the cores ends up making the memory test significantly faster, then I suppose the NUMA awareness code I have pushed to this repository's numa branch could be tested. It does wonders on my 4S E5-4627v2 Dell R820 server with 256 GB of RAM, from the same generation as your workstations: some tests become 15-20x faster with NUMA awareness enabled when the server is running in 4S mode, though the performance gains in 2S mode were much lower because the performance wasn't that bad in the first place ( https://github.com/memtest86plus/memtest86plus/discussions/12#discussioncomment-5433556 ). And the power consumption can only be significantly lower.

Caveat: the code in the numa branch works on my emulated and physical computers, all of single socket - single NUMA node, 2S - 2 NUMA nodes, 4S - 4 or 8 NUMA nodes; however, I'm not aware that anybody else tested it yet on real computers :)

ThomasWaldmann commented 11 months ago

numactl -H output on the T5610

with 4x16GB + 4x8GB normal RDIMMs

tw@c64:~$ sudo numactl -H
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 16 17 18 19 20 21 22 23
node 0 size: 64346 MB
node 0 free: 63027 MB
node 1 cpus: 8 9 10 11 12 13 14 15 24 25 26 27 28 29 30 31
node 1 size: 32251 MB
node 1 free: 31348 MB
node distances:
node   0   1 
  0:  10  20 
  1:  20  10 

with 4x32GB + 4x32GB LRDIMMs

tw@c64:~$ sudo numactl -H
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 16 17 18 19 20 21 22 23
node 0 size: 128858 MB
node 0 free: 127845 MB
node 1 cpus: 8 9 10 11 12 13 14 15 24 25 26 27 28 29 30 31
node 1 size: 129019 MB
node 1 free: 127919 MB
node distances:
node   0   1 
  0:  10  20 
  1:  20  10 
debrouxl commented 11 months ago

It's interesting that you get very different performance characteristics with RDIMMs and LRDIMMs.

01e3 commented 11 months ago

Would it be possible to do memory benchmark using AIDA (Windows) and ramspeed (Linux)? Also, what about the "commercial" version of memtest? I wonder if the same problem exists there, perhaps due to BIOS limitation in handling memory types?

ThomasWaldmann commented 11 months ago

Now it gets weird:

I don't get the slow memtest any more. Now it shows 9.46GB/s memory speed for the 8 LRDIMMs in the T5610.

I did not change BIOS settings (and there is only 1 NUMA setting anyway: enable/disable, which was always enabled).

I did some wallclock measurements with test 4 in 2 core (C0 and C8) sequential mode and get comparable performance with the LRDIMMs now as with the RDIMMs. I see that "via other cpu socket" memory access is about 40% slower.

Currently it is running a full test pass.......

ThomasWaldmann commented 11 months ago

@01e3 I currently can't run windows (booting anything windowsy, including windows ISOs) hangs at the spinning dots. Didn't buy a commercial memtest either.

Can run sw available for Ubuntu 22.04.

debrouxl commented 11 months ago

The crippled no-fee version of Passmark's Memtest86 only supports 16 cores anyway. I don't know whether that means 16 threads or 16 cores; if the former, that doesn't even allow optimal testing of most 2S platforms produced since the Sandy Bridge generation over a decade ago, about 6 generations back (SNB/IVB, HSW/BDW, SKX, CLX, ICX/CPX) from the current Sapphire Rapids (SPR).

It's both weird and good that you're getting reasonable memory speed now :) 40% slower memory access on the other proximity domain looks reasonable as well. The semi-experimental NUMA-aware mode of memtest86+ should still be faster and consume less power than the normal mode.