madMAx43v3r / chia-plotter

Apache License 2.0
2.27k stars 664 forks source link

Cant use all threads #510

Open Kevin5471 opened 3 years ago

Kevin5471 commented 3 years ago

Cant use all threads.What can I do. image

jackykwandesign commented 3 years ago

-r 32 -u 128 will give you best result

shilailin commented 3 years ago

i have a server with 2698v3 *2 (totally 32 cores 64 threads) , when set -r = 62 , also can't use all threads, the cpu is always runing only at about 20%.

chinabala commented 3 years ago

-r is so confusing i must say.. some say it's core, and other said it's thread. Do we have a clear version of what -r is?

and to people with higher CPU or 2 CPUs, they all have the same issue.

sisinet commented 3 years ago

You think it’s complicated. From the screenshots, SSD has become a bottleneck, so the CPU resources cannot be exhausted. I think it would be better to reduce the number of barrels!

jackykwandesign commented 3 years ago

You think it’s complicated. From the screenshots, SSD has become a bottleneck, so the CPU resources cannot be exhausted. I think it would be better to reduce the number of barrels!

I tried to reduce the buckets to 64 but it give a much poor result

maybe -r 16 -u 64 will better than -r 32 -u 64

sisinet commented 3 years ago

You can try to open multiple parallel tasks

sisinet commented 3 years ago

親愛的台灣或香港同胞!重點觀察第一階段,第四階段本身對CPU資源佔用就少。

jackykwandesign commented 3 years ago

請看 #488 的討論,我嘗試了3990x 的不同參數 Please take a look of #488 , i tried different params of 3990x

jackykwandesign commented 3 years ago

for your quick reference image

sisinet commented 3 years ago

哇!你真夠心細的,還做了這樣的測試表格,如果你想跑滿CPU,那就並行分佈執行多任務吧!

JulienPlanchetCoineo commented 3 years ago

Same for me. I'm running on Windows 10, 2x Corsair MP600 NVMe on a TRX40 Aorus Master MB. 256 Gb of 3200 MHz of DDR4 Threadripper 3990X 64 c 128t Only half of my thread can be used with -r 32. -r 64 doesn't change anything at all. https://i.imgur.com/TGMEuvj.png It takes up to 55 min / plot, which is really high in comparison with previous posts.

The RAM Disk is Seq Read 3882 MB/s and Seq Write 3338 MB/s (the RAM is on 2133 MHz, I need to change that in the BIOS). NVMe RAID (software) is Seq Read 4941 MB/s and Seq Write 3900 MB/s.

How can I use 100% or so of my CPU thread to reduce as much as possible the calculation time ? The only bottleneck I see is the final destination but it doesn't explain why the plotting takes so much time.

jackykwandesign commented 3 years ago

n

tried ubuntu ? i meet another guy which is win 10, 3990x, and same result with you

JulienPlanchetCoineo commented 3 years ago

I'll try tomorrow, but I'd like to stay on Windows for many reasons, like plotting + farming on the same machine

jackykwandesign commented 3 years ago

I'll try tomorrow, but I'd like to stay on Windows for many reasons, like plotting + farming on the same machine

ubuntu can do that too, chia official GUI have ubuntu version

JulienPlanchetCoineo commented 3 years ago

Yes but using Hpool, I know that's not a good way but for a small farm, it's a better stable revenue

jackykwandesign commented 3 years ago

Hpool also have linux version actually

igitabout commented 3 years ago

Cant use all threads.What can I do. image

I'm running on Windows 10, 2x Corsair MP600 NVMe on a TRX40 Aorus Master MB. 256 Gb of 3200 MHz of DDR4 Threadripper 3990X 64 c 128t

You could be close to RAM budget with 120 threads but barring that, more threads does not mean more performance - this is a gross misunderstanding of how CPU's work. You will see no tangible benefit by using more than 32 threads on 3995WX or 31 threads on 3990X.

It takes up to 55 min / plot, which is really high in comparison with previous posts.

I'm finishing plots ~690 seconds on 3995WX which has similar performance to 3990X so there is a problem with your setup. Unless you provide more information about your setup, configuration and command used it's difficult to provide advice. For example, is Windows installed on the plotting drives? Are you using ram drives? This is not an exhaustive list. We need every piece of information, not bits and pieces.

JulienPlanchetCoineo commented 3 years ago

Cant use all threads.What can I do. image

I'm running on Windows 10, 2x Corsair MP600 NVMe on a TRX40 Aorus Master MB. 256 Gb of 3200 MHz of DDR4 Threadripper 3990X 64 c 128t

You could be close to RAM budget with 120 threads but barring that, more threads does not mean more performance - this is a gross misunderstanding of how CPU's work. You will see no tangible benefit by using more than 32 threads on 3995WX or 31 threads on 3990X.

It takes up to 55 min / plot, which is really high in comparison with previous posts.

I'm finishing plots ~690 seconds on 3995WX which has similar performance to 3990X so there is a problem with your setup. Unless you provide more information about your setup, configuration and command used it's difficult to provide advice. For example, is Windows installed on the plotting drives? Are you using ram drives? This is not an exhaustive list. We need every piece of information, not bits and pieces.

Sure, thanks for your answer and your time. Wow 690 seconds is awesome ! Windows 10 is NOT installed on the plotting drives (Corsair MP600), but on a 256 GB SATA drive MP600 x 2 are agregated on Windows (software), using storage space function. I'm using RAMDRIVE with "ImDisk", 110 Gb of ram allocated for the secondary temp drive. I'm running pre-compiled Windows exe 0.0.5 The command used is : powershell ".\chia_plot.exe -n 250 -r 32 -u 128 -t E:\ -2 R:\ -d D:\ -p key1 -f key2 | tee '%LOG_FILE%'" E:\ is the NVMe RAID 0 R:\ is the RAMDISK of 110 GB, from the 256 GB of CORSAIRCMK64GX4M2E3200C16 ((32GB2)*4)

If you need more informations, I'm happy to provide them. Thanks again 😄

scerbera commented 3 years ago

Cant use all threads.What can I do. image

I'm running on Windows 10, 2x Corsair MP600 NVMe on a TRX40 Aorus Master MB. 256 Gb of 3200 MHz of DDR4 Threadripper 3990X 64 c 128t

You could be close to RAM budget with 120 threads but barring that, more threads does not mean more performance - this is a gross misunderstanding of how CPU's work. You will see no tangible benefit by using more than 32 threads on 3995WX or 31 threads on 3990X.

It takes up to 55 min / plot, which is really high in comparison with previous posts.

I'm finishing plots ~690 seconds on 3995WX which has similar performance to 3990X so there is a problem with your setup. Unless you provide more information about your setup, configuration and command used it's difficult to provide advice. For example, is Windows installed on the plotting drives? Are you using ram drives? This is not an exhaustive list. We need every piece of information, not bits and pieces.

3995 has 8 mem lanes where as 3990 has only 4, basically the only difference but a big difference in a task like this

igitabout commented 3 years ago

3995 has 8 mem lanes where as 3990 has only 4, basically the only difference but a big difference in a task like this

There is no performance difference between 4 channels or 8 when memory and drives are at their fastest because the plotting process will always be limited by the computing power of the CPU. Just because something seems logical doesn't make it a fact. This type of thinking is no different to reading specs on phones - one has 5k mah battery, another has 3k mah - people will automatically assume the 5k mah will last longer - it seems logical but it doesn't make it a fact. It applies to drives, motherboards and more.

@JulienPlanchetCoineo with regards to your issue, what are the read/write speeds to your disk during plot? You can find this information through Task Manager > Performance > Select disk and through the resource monitor application. A log output of a plot would be great too. Do you have any non-chia related applications or tasks running?

Kevin5471 commented 3 years ago

image image It's OK , Full loading

JulienPlanchetCoineo commented 3 years ago

3995 has 8 mem lanes where as 3990 has only 4, basically the only difference but a big difference in a task like this

There is no performance difference between 4 channels or 8 when memory and drives are at their fastest because the plotting process will always be limited by the computing power of the CPU. Just because something seems logical doesn't make it a fact. This type of thinking is no different to reading specs on phones - one has 5k mah battery, another has 3k mah - people will automatically assume the 5k mah will last longer - it seems logical but it doesn't make it a fact. It applies to drives, motherboards and more.

@JulienPlanchetCoineo with regards to your issue, what are the read/write speeds to your disk during plot? You can find this information through Task Manager > Performance > Select disk and through the resource monitor application. A log output of a plot would be great too. Do you have any non-chia related applications or tasks running?

The only non-related task is lolMiner for ETH mining, but with or without, it doesn't change the time to complete a plot. Sure, here is a log of plotting, thanks you for your help 😃

Multi-threaded pipelined Chia k32 plotter - aff2601
Build 0.5.0 for Windows. Check for latest updates: https://stotiks.github.io/chia-plotter/

Final Directory: D:\
Number of Plots: 250
Crafting plot 1 out of 250
Process ID: 18092
Number of Threads: 32
Number of Buckets: 2^7 (128)
Pool Public Key:   key1
Farmer Public Key: key2
Working Directory:   E:\
Working Directory 2: R:\
Plot Name: plot-k32-2021-06-16-17-59-c46210533ab4189919956abd3bb7037cd376e9c25870068f909a674c25568631
[P1] Table 1 took 25.5465 sec
[P1] Table 2 took 181.862 sec, found 4294899724 matches
[P1] Table 3 took 220.32 sec, found 4294773925 matches
[P1] Table 4 took 257.331 sec, found 4294578582 matches
[P1] Table 5 took 261.584 sec, found 4294216957 matches
[P1] Table 6 took 250.037 sec, found 4293515914 matches
[P1] Table 7 took 202.941 sec, found 4292068799 matches
Phase 1 took 1399.72 sec
[P2] max_table_size = 4294967296
[P2] Table 7 scan took 8.11882 sec
[P2] Table 7 rewrite took 43.696 sec, dropped 0 entries (0 %)
[P2] Table 6 scan took 17.0955 sec
[P2] Table 6 rewrite took 32.0165 sec, dropped 581445826 entries (13.5424 %)
[P2] Table 5 scan took 16.3719 sec
[P2] Table 5 rewrite took 30.4487 sec, dropped 762138323 entries (17.748 %)
[P2] Table 4 scan took 19.5724 sec
[P2] Table 4 rewrite took 30.3888 sec, dropped 828982621 entries (19.303 %)
[P2] Table 3 scan took 24.0341 sec
[P2] Table 3 rewrite took 32.8919 sec, dropped 855169025 entries (19.9119 %)
[P2] Table 2 scan took 26.7719 sec
[P2] Table 2 rewrite took 31.5017 sec, dropped 865615056 entries (20.1545 %)
Phase 2 took 337.817 sec
Wrote plot header with 268 bytes
[P3-1] Table 2 took 36.7712 sec, wrote 3429284668 right entries
[P3-2] Table 2 took 31.8536 sec, wrote 3429284668 left entries, 3429284668 final
[P3-1] Table 3 took 38.2127 sec, wrote 3439604900 right entries
[P3-2] Table 3 took 33.1475 sec, wrote 3439604900 left entries, 3439604900 final
[P3-1] Table 4 took 38.358 sec, wrote 3465595961 right entries
[P3-2] Table 4 took 33.704 sec, wrote 3465595961 left entries, 3465595961 final
[P3-1] Table 5 took 39.2263 sec, wrote 3532078634 right entries
[P3-2] Table 5 took 33.9046 sec, wrote 3532078634 left entries, 3532078634 final
[P3-1] Table 6 took 42.1395 sec, wrote 3712070088 right entries
[P3-2] Table 6 took 36.3809 sec, wrote 3712070088 left entries, 3712070088 final
[P3-1] Table 7 took 32.4243 sec, wrote 4292068799 right entries
[P3-2] Table 7 took 41.8504 sec, wrote 4292068799 left entries, 4292068799 final
Phase 3 took 447.417 sec, wrote 21870703050 entries to final plot
[P4] Starting to write C1 and C3 tables
[P4] Finished writing C1 and C3 tables
[P4] Writing C2 table
[P4] Finished writing C2 table
Phase 4 took 113.562 sec, final plot size is 108796456849 bytes
Total plot creation time was 2298.62 sec
Started copy to D:\plot-k32-2021-06-16-17-59-c46210533ab4189919956abd3bb7037cd376e9c25870068f909a674c25568631.plot

The read/write speed is almost 500 MB/s up to 1.2 GB/s for NVMe

igitabout commented 3 years ago

@Kevin5471 in my testing, parallel plotting always resulted in less plots per day vs plotting one at a time but I suppose there is a scenario where if you're plotting on slow disks then there could be potential improvements but that's just an educated guess. When plotting one at a time, having 100% core usage does not represent the plotting process at its fullest potential. It represents a problem. However if it's working for you then that's all that matters.

@JulienPlanchetCoineo

The only non-related task is lolMiner for ETH mining, but with or without, it doesn't change the time to complete a plot.

If you want to improve the plotting speed which I'm confident you can do, you need to eliminate all potential bottlenecks. There are so many variables that could be causing this and it would take an excessively long time for someone to help you unless you remove all those variables. It's like painting or building a wardrobe in a room that has existing furniture in it vs no furniture.

I'm not sure how much testing you did but when variables are removed (such as you disabling lolminer), you cannot perform a few tests to determine whether it is faster or not (something you see often in the Discord channel so forgive the assumption). You need to run the test for several hours, potentially a day or two and take the average.

If you have 31 threads active then there is no room for any other task and in Windows you have core applications such as DWM, Windows Defender and more (even leaving task manager open and Windows updates!) - which will slow the speed of each individual thread and thus your overall plotting times. Also when plots are finished and are being moved, this also needs CPU time. In windows I recommend setting the plotter to 28-30 threads or switch to headless Linux for better performance.

So to summarise, please remove everything non chia related and start working on improving your plotting process. Yes you will lose your time but you will find the source of the problem far quicker by starting from the beginning then trying to optimise a system which has too many things going on.

JulienPlanchetCoineo commented 3 years ago

@igitabout I moved to Ubuntu 20.04 LTS, but I still have same result :

Multi-threaded pipelined Chia k32 plotter - 59648ec
Final Directory: /mnt/plots/
Number of Plots: 1
Crafting plot 1 out of 1
Process ID: 19560
Number of Threads: 31
Number of Buckets P1:    2^7 (128)
Number of Buckets P3+P4: 2^7 (128)
Pool Public Key:   key1
Farmer Public Key: key2
Working Directory:   /mnt/plotter1/
Working Directory 2: /mnt/ram/
Plot Name: plot-k32-2021-06-20-11-29-3eab59cf0e6bf8df126d71646884b97fd9eea28ebea7e0ece6776b0a187a1208
[P1] Table 1 took 5.76948 sec
[P1] Table 2 took 88.0148 sec, found 4295119429 matches
[P1] Table 3 took 114.536 sec, found 4295130671 matches
[P1] Table 4 took 122.263 sec, found 4295169623 matches
[P1] Table 5 took 120.458 sec, found 4295005659 matches
[P1] Table 6 took 122.112 sec, found 4294896014 matches
[P1] Table 7 took 116.33 sec, found 4294761659 matches
Phase 1 took 689.493 sec
[P2] max_table_size = 4295169623
[P2] Table 7 scan took 9.58934 sec
[P2] Table 7 rewrite took 26.0865 sec, dropped 0 entries (0 %)
[P2] Table 6 scan took 13.1625 sec
[P2] Table 6 rewrite took 109.593 sec, dropped 581270025 entries (13.534 %)
[P2] Table 5 scan took 12.7671 sec
[P2] Table 5 rewrite took 105.071 sec, dropped 762048290 entries (17.7427 %)
[P2] Table 4 scan took 75.9264 sec
[P2] Table 4 rewrite took 104.675 sec, dropped 829061361 entries (19.3022 %)
[P2] Table 3 scan took 108.992 sec
[P2] Table 3 rewrite took 104.628 sec, dropped 855220833 entries (19.9114 %)
[P2] Table 2 scan took 70.8774 sec
[P2] Table 2 rewrite took 103.056 sec, dropped 865655654 entries (20.1544 %)
Phase 2 took 856.8 sec
Wrote plot header with 268 bytes
[P3-1] Table 2 took 53.6992 sec, wrote 3429463775 right entries
[P3-2] Table 2 took 136.339 sec, wrote 3429463775 left entries, 3429463775 final
[P3-1] Table 3 took 19.8201 sec, wrote 3439909838 right entries
[P3-2] Table 3 took 114.118 sec, wrote 3439909838 left entries, 3439909838 final
[P3-1] Table 4 took 37.6482 sec, wrote 3466108262 right entries
[P3-2] Table 4 took 117.559 sec, wrote 3466108262 left entries, 3466108262 final
[P3-1] Table 5 took 38.1088 sec, wrote 3532957369 right entries
[P3-2] Table 5 took 127.124 sec, wrote 3532957369 left entries, 3532957369 final
[P3-1] Table 6 took 40.6512 sec, wrote 3713625989 right entries
[P3-2] Table 6 took 119.338 sec, wrote 3713625989 left entries, 3713625989 final
[P3-1] Table 7 took 22.6624 sec, wrote 4294761659 right entries
[P3-2] Table 7 took 134.085 sec, wrote 4294761659 left entries, 4294761659 final
Phase 3 took 964.487 sec, wrote 21876826892 entries to final plot
[P4] Starting to write C1 and C3 tables
[P4] Finished writing C1 and C3 tables
[P4] Writing C2 table
[P4] Finished writing C2 table
Phase 4 took 225.351 sec, final plot size is 108833307193 bytes
Total plot creation time was 2736.18 sec (45.603 min)
Started copy to /mnt/plots/plot-k32-2021-06-20-11-29-3eab59cf0e6bf8df126d71646884b97fd9eea28ebea7e0ece6776b0a187a1208.plot
Copy to /mnt/plots/plot-k32-2021-06-20-11-29-3eab59cf0e6bf8df126d71646884b97fd9eea28ebea7e0ece6776b0a187a1208.plot finished, took 102.986 sec, 1007.82 MB/s avg.
JulienPlanchetCoineo commented 3 years ago

I've managed to find a solution. Here it is : Upgraded Ubuntu, so the firmware upgraded itself. Then, I formatted my two nvme drives in EXT4. Now, I can plot at 16min/plot average. Nice 👍

jackykwandesign commented 3 years ago

I've managed to find a solution. Here it is : Upgraded Ubuntu, so the firmware upgraded itself. Then, I formatted my two nvme drives in EXT4. Now, I can plot at 16min/plot average. Nice 👍

May i know the difference is sudo apt update && sudo apt upgrade -y ? or anything else ?

JulienPlanchetCoineo commented 3 years ago

I've managed to find a solution. Here it is : Upgraded Ubuntu, so the firmware upgraded itself. Then, I formatted my two nvme drives in EXT4. Now, I can plot at 16min/plot average. Nice 👍

May i know the difference is sudo apt update && sudo apt upgrade -y ? or anything else ?

apt update is for updating your packages while upgrade is for upgrading OS related packages (kernel, firmware..). Am I right ?