madMAx43v3r / chia-plotter

Apache License 2.0
2.27k stars 662 forks source link

P1,P3 CPU usage is about 50%. #851

Open boxter007 opened 3 years ago

boxter007 commented 3 years ago

I used command : chia_plot -n -1 -r 40 -K 2 -t /mnt/ram/ -d /disk101/nft/ -f xxx -c xxx with 40 cores and 384G ram , /mnt/ram is tmpfs 280G. During P1 and P3, the CPU is using about 50%, P2 is about 75%. Why? How can I use the CPU 100%?

gryan315 commented 3 years ago

Do you have 40 physical cores? If it's 40 threads, you should use -r 20. Is your CPU governor set to performance? Are you using an LTS kernel? Are you using 32gb quad rank LRDIMMs? If you have 2x 20 core/40 thread CPUs, you would probably benefit from creating 2 110G tmpfs with a mpol for the local node, then running 2 instances in parallel with numactl and an NVME as -t.

boxter007 commented 3 years ago

Do you have 40 physical cores? If it's 40 threads, you should use -r 20. Is your CPU governor set to performance? Are you using an LTS kernel? Are you using 32gb quad rank LRDIMMs? If you have 2x 20 core/40 thread CPUs, you would probably benefit from creating 2 110G tmpfs with a mpol for the local node, then running 2 instances in parallel with numactl and an NVME as -t.

Yes, I have 40 cores 80 threads. Yes, CPU governor has been set to performance. Yes, I use Ubuntu 20.04.2 LTS. I use 8x32G+2x64G LRDIMM. I have 2 x 20cores/40threads

boxter007 commented 3 years ago

you would probably benefit from creating 2 110G tmpfs with a mpol for the local node, then running 2 instances in parallel with numactl and an NVME as -t.

I tried, it is more useful. But I want to know why?

gryan315 commented 3 years ago

you would probably benefit from creating 2 110G tmpfs with a mpol for the local node, then running 2 instances in parallel with numactl and an NVME as -t.

I tried, it is more useful. But I want to know why?

In dual socket systems, when one CPU has to use the RAM on the other CPU, there is a performance penalty. This can happen if there is no room on the "local" numa node, or when a process migrates from a thread on one CPU to a thread on the other CPU, but still has data in the memory of the first CPU. The performance hit is minor in modern systems, but it is there.

Yes, I have 40 cores 80 threads. Yes, CPU governor has been set to performance. Yes, I use Ubuntu 20.04.2 LTS. I use 8x32G+2x64G LRDIMM. I have 2 x 20cores/40threads

You can get probably another 5-10% performance by installing kernel 5.11 or better (then reboot and choose that kernel in your bootloader). Most 32GB and larger LRDIMMs are quad rank, where-as smaller RDIMMS are usually dual rank. Quad rank means that the DIMM is basically made up of 4 smaller DIMMs that now have to share the bandwidth of a single DIMM slot. This will reduce performance slightly, but should not be a very big issue if you can plot 2 in ramdisk together. A bigger performance issue is the non-uniform distribution of your RAM with only 2x 64 dimms. This can lead to pretty significant reduction in performance, and if you can, I would recommend removing those 2 dimms and installing 8x 32gb or 8x 16gb if you can. If you can't upgrade the RAM, even just removing those 2x 64gb and running 2x 110G tmpfs as -2 with NVME as -t will likely see a boost in performance.