madMAx43v3r / chia-gigahorse

223 stars 31 forks source link

Ddr3 1333 vs 1600 #84

Open hajes opened 1 year ago

hajes commented 1 year ago

Hi, I have basically identical system to Max's test system "HP Z420 workstation with a single Xeon E5-2695 v2, 256G (8x32G) of DDR3-1600 memory, 1 TB Samsung 970 PRO SSD, 10G fiber NIC and a RTX 3060 Ti."

The only difference is dual e5-2695v2 (also tried e5-2620 with little difference) with ddr3 1333mhz 512gb on SuperMicro x9 mb. Ubuntu was russian roulette, my favourite Clear Linux doesn't like nVidia...ended up with Debian

I get c3 plots in about 4.7mins compared to 3 mins . nvtop shows 11GBs. Is it really so big difference between 1333 & 1600mhz?

There is option to force ddr3 1600mhz in bios with unstability warning. My guess is that is like OC on servers

Max doesn't mention what OS he uses

ddubick commented 1 year ago

1333 is about a 20% drop from 1600 so it will be a little bit slower. But seeing as you have a dual CPU system you might want to check that all of your components are on one CPU (GPU and NVMe) and pin the plotter to that NUMA node. Maybe take out the 2nd CPU just to simplify things if you're only planning to plot K32 and only need 256G.

hajes commented 1 year ago

-20% is about 3.6 min time. Still 1min behind.

If I understand correctly. Quad channel setup suppose to have 4x ~10GBs bandwidth. PCIe 3 16x is about 15GBs.

It shouldn't be even issue no?

What I understand plotting moves vram <> ram no?

Of course, I have followed Max's guide. Pinned plotter to cpu2 where gpu, ssd are physically connected. Interesting is I have lower times without numactl.

I can't use single cpu (already tried) because cpu2 have 16x pcie slots. Only cpu1 works in single setup...

I suspect RAM throttling due overheating. I have to add fans

Also wanna try Arch & Clear Linux

madMAx43v3r commented 1 year ago

If I understand correctly. Quad channel setup suppose to have 4x ~10GBs bandwidth. PCIe 3 16x is about 15GBs. It shouldn't be even issue no?

No, the load on RAM is more than just PCIe bandwidth. You also have upload and download in parallel, so at peak load you have 24 GB/s total, but download has a 3x higher load on RAM. When running at the limit of PCIe 3.0 you have around 50 GB/s load on RAM.

hajes commented 1 year ago

Thanks for explanation Max. It turns out ssd was dying, corrupted OS...some issues with latest official NVIDIA cuda.

Reinstalled Debian, used official Debian drivers. Added cooling for RAM. Some kernel tuning.

<3.5min for c8 plots.

What is interesting though - numactl pinning to gpu,ram,cpu node is slower than just runnig plotter on its own

madMAx43v3r commented 1 year ago

What is interesting though - numactl pinning to gpu,ram,cpu node is slower than just runnig plotter on its own

Because without pinning you have 8 channel RAM ;)

madMAx43v3r commented 1 year ago

If you had faster RAM or more channels per socket, it would be a different story.

hajes commented 1 year ago

I have ordered 1600mhz, but seller made a mistake...5 servers...suppose to be 1600mhz modules...we are discussing further steps. Seller offers discount...i want 1600mhz as promised.

I see in nvtop 3060ti is hitting almost limit of pcie 3 16x. But I also see in nvidia xorg "bandwidth max 67%“.

There may be still improvement though

hajes commented 1 year ago

What is interesting though - numactl pinning to gpu,ram,cpu node is slower than just runnig plotter on its own

Because without pinning you have 8 channel RAM ;)

I followed your suggestion for dual cpu rigs.

There is dual Xeon e5-2697v2. Each cpu/node has got 8x32gb ram. Should be quad channel no?

Unfortunately, 16x slots are on cpu2...it would be better with single cpu. Still, ~400Wh and <0.01chf/plot.

In old days, each plot did cost me 0.05$

Great job Max. Thanks for your time. I have to send you XCH again. Last time it was just 5xch

madMAx43v3r commented 1 year ago

There may be still improvement though

It will never reach theoretical max speed, due to overhead in the driver etc... the best you can get with nvidia on PCIe 3.0 is 12 GB/s sustained average.

just 5xch

lol you are the biggest donor then, thx

hajes commented 1 year ago

I never understood "free" work, but always admired Open Source guys.

I regularly donate to Debian guys because they provide simple solution to my servers. Other projects what make my life simpler.

You have saved me resources so I shared. Back then it was around 200$ for 5XCH.

I do not offer free, but always share with greatest minds on Earth. Thanks! Modern freeGoogle kids have no regard for anything. They even demand your knowledge for free ROFL

It is always amusing to observe clueless kids whining about pool or dev fees. So one gets free SW, place to make profit. Yet, there is problem with 1+1% fees ROFL

So many get-fast-rich business men. Average dividend yield in great business ranges 2-9%...nobody complains