Open hajes opened 1 year ago
1333 is about a 20% drop from 1600 so it will be a little bit slower. But seeing as you have a dual CPU system you might want to check that all of your components are on one CPU (GPU and NVMe) and pin the plotter to that NUMA node. Maybe take out the 2nd CPU just to simplify things if you're only planning to plot K32 and only need 256G.
-20% is about 3.6 min time. Still 1min behind.
If I understand correctly. Quad channel setup suppose to have 4x ~10GBs bandwidth. PCIe 3 16x is about 15GBs.
It shouldn't be even issue no?
What I understand plotting moves vram <> ram no?
Of course, I have followed Max's guide. Pinned plotter to cpu2 where gpu, ssd are physically connected. Interesting is I have lower times without numactl.
I can't use single cpu (already tried) because cpu2 have 16x pcie slots. Only cpu1 works in single setup...
I suspect RAM throttling due overheating. I have to add fans
Also wanna try Arch & Clear Linux
If I understand correctly. Quad channel setup suppose to have 4x ~10GBs bandwidth. PCIe 3 16x is about 15GBs. It shouldn't be even issue no?
No, the load on RAM is more than just PCIe bandwidth. You also have upload and download in parallel, so at peak load you have 24 GB/s total, but download has a 3x higher load on RAM. When running at the limit of PCIe 3.0 you have around 50 GB/s load on RAM.
Thanks for explanation Max. It turns out ssd was dying, corrupted OS...some issues with latest official NVIDIA cuda.
Reinstalled Debian, used official Debian drivers. Added cooling for RAM. Some kernel tuning.
<3.5min for c8 plots.
What is interesting though - numactl pinning to gpu,ram,cpu node is slower than just runnig plotter on its own
What is interesting though - numactl pinning to gpu,ram,cpu node is slower than just runnig plotter on its own
Because without pinning you have 8 channel RAM ;)
If you had faster RAM or more channels per socket, it would be a different story.
I have ordered 1600mhz, but seller made a mistake...5 servers...suppose to be 1600mhz modules...we are discussing further steps. Seller offers discount...i want 1600mhz as promised.
I see in nvtop 3060ti is hitting almost limit of pcie 3 16x. But I also see in nvidia xorg "bandwidth max 67%“.
There may be still improvement though
What is interesting though - numactl pinning to gpu,ram,cpu node is slower than just runnig plotter on its own
Because without pinning you have 8 channel RAM ;)
I followed your suggestion for dual cpu rigs.
There is dual Xeon e5-2697v2. Each cpu/node has got 8x32gb ram. Should be quad channel no?
Unfortunately, 16x slots are on cpu2...it would be better with single cpu. Still, ~400Wh and <0.01chf/plot.
In old days, each plot did cost me 0.05$
Great job Max. Thanks for your time. I have to send you XCH again. Last time it was just 5xch
There may be still improvement though
It will never reach theoretical max speed, due to overhead in the driver etc... the best you can get with nvidia on PCIe 3.0 is 12 GB/s sustained average.
just 5xch
lol you are the biggest donor then, thx
I never understood "free" work, but always admired Open Source guys.
I regularly donate to Debian guys because they provide simple solution to my servers. Other projects what make my life simpler.
You have saved me resources so I shared. Back then it was around 200$ for 5XCH.
I do not offer free, but always share with greatest minds on Earth. Thanks! Modern freeGoogle kids have no regard for anything. They even demand your knowledge for free ROFL
It is always amusing to observe clueless kids whining about pool or dev fees. So one gets free SW, place to make profit. Yet, there is problem with 1+1% fees ROFL
So many get-fast-rich business men. Average dividend yield in great business ranges 2-9%...nobody complains
Hi, I have basically identical system to Max's test system "HP Z420 workstation with a single Xeon E5-2695 v2, 256G (8x32G) of DDR3-1600 memory, 1 TB Samsung 970 PRO SSD, 10G fiber NIC and a RTX 3060 Ti."
The only difference is dual e5-2695v2 (also tried e5-2620 with little difference) with ddr3 1333mhz 512gb on SuperMicro x9 mb. Ubuntu was russian roulette, my favourite Clear Linux doesn't like nVidia...ended up with Debian
I get c3 plots in about 4.7mins compared to 3 mins . nvtop shows 11GBs. Is it really so big difference between 1333 & 1600mhz?
There is option to force ddr3 1600mhz in bios with unstability warning. My guess is that is like OC on servers
Max doesn't mention what OS he uses