madMAx43v3r / chia-plotter

Apache License 2.0
2.27k stars 662 forks source link

reduced plotting performance after upgrading from 0.0.5 to 0.1.1 #785

Open stevekm opened 3 years ago

stevekm commented 3 years ago

After upgrading from chia_plot verision 0.0.5 to 0.1.1 (on Windows), I saw a ~50% decrease in speed on the same system with the same configuration.

command used with 0.0.5:

chia_plot --threads 15 --tmpdir D:\ --tmpdir2 D:\ --farmerkey ... --poolkey ...

command used with 0.1.1 is the same as above except with --contract instead of --poolkey

the D:\ drive here is a single SSD. The default value of 256 buckets was not changed.

I also tested 0.1.1 again with more threads (--threads 30) and only saw minimal improvements. Some estimated average times:

phase v0.0.5 (15 threads) v0.1.1 (15 threads) v0.1.1 (30 threads)
1 1100s 1700s 1400s
2 500s 800s 700s
3 480s 1800s 1800s
4 170s 170s 170s
total 2250s 4470s 4070s

system:

From the looks of the Task Manager, there is a lot of unused system resources while plotting with 0.1.1, whereas with 0.0.5 chia_plot was frequently using nearly 100% of the available CPU time and SSD bandwidth.

I understand that changes have been made since version 0.0.5 that may have improved performance on some Xeon and Threadripper test systems, but it seems to have greatly hurt performance in this case.

Maybe we could get those new upgrades toggled on/off from the command line? As it stands right now, version 0.1.1 is required in order to use the new Chia pooling protocol, which also means all the old plots made with 0.0.5 will need to be re-created under 0.1.1 at this reduced speed.

aj10017 commented 3 years ago

I'm also experiencing similar performance issues. My 3700x with 2x NVME would finish in 3300s-3500s but now it's ~4000s+

stevekm commented 3 years ago

worth noting that at ~75min/plot, I am getting about 19 plots/day, which is actually less than I was able to get with the official Chia plotter (maxed around 26 plots/day), which is not a good situation for mad max plotter to be in as an alternative plotter :(

Previously with version 0.0.5 I was getting about 40 plots/day on this system

Mattchew86 commented 3 years ago

I'm experiencing the same issue using a Ryzen 5900x.

Plot hardware: Ryzen 5900x 128gb ddr 4 3600 ram 2 x 2tb firecuda 520 nvme's in raid 0

Plotter config 22 threads 256 & 128 buckets t1: Nvme's raid0 drive t2: ramdisk

Plot times are varying a lot. Anywhere between 40 minutes and 85 minutes. I've tried loads of different settings, it seems like the system is underperforming? I've been trawling the internet and other people with similar configs seem to be having the same issue.

Something weird seems to be going on with the temp directories. Even though -G is not set, it seems to be alternating the drives?

reubes commented 3 years ago

make sure to trim your SSDs regularly with 'sudo fstrim -v /mnt/ssdpath/' and also mount them (and any SMR HDDs) with the discard option.

Mattchew86 commented 3 years ago

make sure to trim your SSDs regularly with 'sudo fstrim -v /mnt/ssdpath/' and also mount them (and any SMR HDDs) with the discard option.

I have trim enabled and and have also turned off indexing and write cache buffer, with CPU is set to realtime priority.

I just can't get the system to consistently write 30-40 minute plots.

madMAx43v3r commented 3 years ago

is this all on windows?

Mattchew86 commented 3 years ago

is this all on windows?

Hey - yes, I am on windows10.

A commenter on issue #786 cut their plotting times in half by installing ubuntu - so looks like a windows issue?

stotiks commented 3 years ago

try this version https://github.com/stotiks/chia-plotter/releases/download/v0.1.1/chia_plot_0.1.2a.zip

altendky commented 3 years ago

make sure to trim your SSDs regularly with 'sudo fstrim -v /mnt/ssdpath/' and also mount them (and any SMR HDDs) with the discard option.

When you mount with discard you don't need to explicitly trim.

Mattchew86 commented 3 years ago

try this version https://github.com/stotiks/chia-plotter/releases/download/v0.1.1/chia_plot_0.1.2a.zip

Thanks - testing now, will let you know how it goes after first plot

aj10017 commented 3 years ago

try this version https://github.com/stotiks/chia-plotter/releases/download/v0.1.1/chia_plot_0.1.2a.zip

Giving this a shot as well.

ditaker commented 3 years ago

try this version https://github.com/stotiks/chia-plotter/releases/download/v0.1.1/chia_plot_0.1.2a.zip

Will try as well because 3 of 4 PC dropped speed on Windows 10 for approximetly 20-60%.

Mattchew86 commented 3 years ago

Number of threads: 22 Number of Buckets P1: 256 Number of Buckers P3+P4: 256 n 5

CPU Ryzen 5900x Ram 128gb T1: 2 x 2tb nvme firecuda's raid 0 T2: 115gb ram risk

Plot 1 Phase 1: 1018s Phase 2: 540s Phase 3: 1075s Phase 4: 87s Total Time: 2719s (45 minutes)

Only 1 plot written so far, but doesn't seem much different.

stotiks commented 3 years ago

@Mattchew86, maybe NVME overheating or something else Here are my results with v0.1.1

AMD Ryzen 7 5800X 64GB@3600Mhz T1: Gigabyte AORUS M.2 Gen4 PCIe X4 NVMe 2TB T2: Gigabyte AORUS M.2 Gen4 PCIe X4 NVMe 2TB

Crafting plot 67 out of 145 Process ID: 3612 Number of Threads: 16 Number of Buckets P1: 2^9 (512) Number of Buckets P3+P4: 2^8 (256) Phase 1 took 994.423 sec Phase 2 took 425.397 sec Phase 3 took 504.534 sec, wrote 21872348936 entries to final plot Phase 4 took 56.2946 sec, final plot size is 108806383894 bytes Total plot creation time was 1980.75 sec (33.0126 min)

Mattchew86 commented 3 years ago

@stotiks What OS are you on? I'm on Windows10 Pro 64-bit.

I don't think it's linked to the NVME's, as:

  1. The temperature for both drives is showing as 40oC in crystal disk and remains constant throughout the whole plotting process and they have their own fan and thermal paste.

  2. I have another 1tb WD drive that I use for the OS and I tried that for plotting and no difference.

  3. There seems to be little difference between plotting only using the NVME's vs using T1 & T2 with t2 as a ram disk.

I have had a plot on v0.1.1 that has plotted in around 35 minutes, so something isn't right.

There seems to be an issue at phase 3.

Plot 2 Phase 1: 1079s Phase 2: 818s Phase 3: 1480s Phase 4: 91s Total Time: 3468s (58 minutes)

vvavepacket commented 3 years ago

I am on Ubuntu and im experiencing the same issue.

Previous version: 20 mins Current version: 60 mins

Mattchew86 commented 3 years ago

@stotiks Just for completeness- here are the results of the first 4of my plots using v 0.1.2a

Plot 1 Phase 1: 1018s Phase 2: 540s Phase 3: 1075s Phase 4: 87s Total Time: 2719s (45 minutes)

Plot 2 Phase 1: 1079s Phase 2: 818s Phase 3: 1480s Phase 4: 91s Total Time: 3468s (58 minutes)

Plot 3 Phase 1: 1195s Phase 2: 880s Phase 3: 1419s Phase 4: 78s Total Time: 3571s (60 minutes)

Plot 4 Phase 1: 1057s Phase 2: 709s Phase 3: 1307s Phase 4: 79s Total Time: 3153s (53 minutes)

Mattchew86 commented 3 years ago

@vvavepacket @ditaker @aj10017 @stevekm

Is your temp plotting drive in RAID?

ditaker commented 3 years ago

@vvavepacket @ditaker @aj10017 @stevekm

Is your temp plotting drive in RAID?

Me not. I have 4PCs. All plotting using M2 SSD 1TB. Speed +-2500. All have 4*8GB RAM. 3PCs with 20 Thread Cores (intel 10900X) and 1PC with 8Thread core intel 9700k if not mistakes).

So... 3 or 5 days ago they created plots approximately from 5000sec to 7000sec. Now: PC with 9700k (weakest) +-6000 sec (+- no changed) PC with 10900x +-10000sec (was 5000-6000sec average before)

Maybe I did smth wrong but I updated to Chia 1.2, then downloaded new plot.exe file and putted it in directory (changed old version 1.1mb to new version 1.8mb). I didn't changed nothing else.

No any idea why it happens and why worst of 4 PCs works slower then PC with more threads and better RAM :D.

Will have free 4 hours tomorrow and will try again If there will be not any resolution before from somebody with the same problem.

Here is last timing of one PC: Phase 1: 6000sec Phase 2: 2166sec Phase 3: 4992sec Phase 4: 193sec Total: 13623sec (before this PC was made it for 5500sec +-. Settings: r -18, u -7 When settings was: r -18, I -8 total time was +-10000sec PC: 10900X (10 cores, 20 threads), 32GB ram, 1TB M2 SSD +-2500 write/read speed.

Mattchew86 commented 3 years ago

Well I swapped to my OS NVME 1tb drive and two plots in a row sub 40 minutes - so that would suggest my NVME raid drive is causing an issue for me - nothing else changed

stevekm commented 3 years ago

@stotiks

try this version https://github.com/stotiks/chia-plotter/releases/download/v0.1.1/chia_plot_0.1.2a.zip

Unfortunately this version actually runs slower for me, avg 4700s with 15 threads. Especially phase 3 took avg 1950s.

st-zelenin commented 3 years ago

what I see in my logs is that plot creation time remains the same (50 min on my machine), but total time for a single plot (creation + copy) increased (2 hours on my machine). Looks like the process of copying is not dedicated now and next plot creation is suspended until copying of the previous is finished.

Total plot creation time was 3386.29 sec (56.4382 min)
Started copy to F:\plots\plot-k32-2021-07-09-23-19-0abcfa6e2b8105fb92eac0299f274251b3a4aa54169a1f3261db75a196c11866.plot
Copy to F:\plots\plot-k32-2021-07-09-21-38-d80e5d3a9e28fcfdedafa6cbc38132109a9bf48e77d026bc19f3341aff80e5f4.plot finished, took 6174.27 sec, 16.8107 MB/s avg.
vvavepacket commented 3 years ago

How do we start the file copy process asynchronously? Such that it copies the file to background while start the next plot immediately

chiamaster commented 3 years ago

How do we start the file copy process asynchronously? Such that it copies the file to background while start the next plot immediately

It is already doing that as you describe.

trevoriv commented 3 years ago

Hopefully this helps someone. I was using the previous version to plot in around 9000 seconds (10 year old i5 2500k) but since using 0.1.1 with the -c function dropped times to around 13000 - 14000 seconds. Tried a few things including 0.1.2a which was the same speed, possibly a bit slower.

However after MS forcing a windows update last night my speeds are back to close to normal, last 3 plots using 0.1.1 have been 9600 seconds, 9800 and now the last one was 9700 so a little slower than before (10%ish) but close enough for me. Maybe something to do with Windows redistributable packages?

daveooo11 commented 3 years ago

I am experiencing the same thing on a 5800x across multiple brands of NVMe using the W10 version. Phase 1 and 3 both started spiking in times once I started NFT plotting with stotik's 0.1.1 version. I am not experiencing any heat throttling either. It seems like these times started getting long specifically after I updated.

Specifically it seems like in the P3-2 in Phase 3 have way longer periods of time now and my CPU barely crosses 30%, and normally sites in the 10-20% ranges.

Secondly, Phase 1 the calculation time just got a bit longer. But I do not notice the same CPU % correlation with Phase 3-2 instances.

EDIT: I am noticing the CPU drops in phase 1 as well. But they are more drastic in Phase 3-2 instances.

5800X stock Tomahawk b550 32GB Ram @3200 Firecuda 1TB Gen 4 (Used for all temp writing)

stevekm commented 3 years ago

are we sure this is specific to Windows? Seems like Linux users are also reporting performance drops?

Re: Windows updates; I am running Windows 10 Pro 21H1 with all updates applied and still getting the reduced plot rates

localh0rst1337 commented 3 years ago

idk what people expect from el cheapo nvme drives - check your nvme saturation in task manager and I promise, it is 100% all the time when the CPU is in 10-20% range. I plot with an enterprise HPE NVMe with 29PB TBW (TLC) ($2000) and it can barely keep up with a 5900X.

EDIT: I am noticing the CPU drops in phase 1 as well. But they are more drastic in Phase 3-2 instances.

5800X stock Tomahawk b550 32GB Ram @3200 Firecuda 1TB Gen 4 (Used for all temp writing)

daveooo11 commented 3 years ago

idk what people expect from el cheapo nvme drives - check your nvme saturation in task manager and I promise, it is 100% all the time when the CPU is in 10-20% range. I plot with an enterprise HPE NVMe with 29PB TBW (TLC) ($2000) and it can barely keep up with a 5900X.

EDIT: I am noticing the CPU drops in phase 1 as well. But they are more drastic in Phase 3-2 instances. 5800X stock Tomahawk b550 32GB Ram @3200 Firecuda 1TB Gen 4 (Used for all temp writing)

I understand why you might think this if you've been plotting on 2k enterprise grade equipment. But the fact is it has nothing to do with the hardware as nothing has changed. Do you think everyone here just decided to all change out their NVMe drives right when the contract plotter came out thus increasing all their times? The point is even though there were no configuration changes from before and now, timings still increased seemingly for no reason.

And no, my NVMe drives stop being saturated in the same places I mentioned the CPU activity lowers. Phase 1, and Phase 3-2 iterations. And this happens on regardless if I slap in the Firecudas I use primarily, or the cheapo 60 dollar WD blues I have. All their timings have increased by 40-60% due to Phase 1 and 3-2.

Here's what my NVMe activity looks like specifically in Phase 3-2 iterations since my plotter happened to be in it when I was posting this. It crashes to sometimes single digits. Then ramps back up to 100% in 3-1 iterations. This happens on every NVMe I try. graph

And here is what the activity time looks like once it hits Phase 3-1 iterations

graph2

daveooo11 commented 3 years ago

I have downloaded and tested 0.0.5 as well and the issue is still happening. Now I am suspicious this is related to some kind of Windows update that happened at roughly the same time as the contract plotter that is causing some bottlenecks in the plotter.

To note, my plotter is on 21H1 with the most recently KB updates.

localh0rst1337 commented 3 years ago

idk what people expect from el cheapo nvme drives - check your nvme saturation in task manager and I promise, it is 100% all the time when the CPU is in 10-20% range. I plot with an enterprise HPE NVMe with 29PB TBW (TLC) ($2000) and it can barely keep up with a 5900X.

EDIT: I am noticing the CPU drops in phase 1 as well. But they are more drastic in Phase 3-2 instances. 5800X stock Tomahawk b550 32GB Ram @3200 Firecuda 1TB Gen 4 (Used for all temp writing)

I understand why you might think this if you've been plotting on 2k enterprise grade equipment. But the fact is it has nothing to do with the hardware as nothing has changed. Do you think everyone here just decided to all change out their NVMe drives right when the contract plotter came out thus increasing all their times? The point is even though there were no configuration changes from before and now, timings still increased seemingly for no reason.

And no, my NVMe drives stop being saturated in the same places I mentioned the CPU activity lowers. Phase 1, and Phase 3-2 iterations. And this happens on regardless if I slap in the Firecudas I use primarily, or the cheapo 60 dollar WD blues I have. All their timings have increased by 40-60% due to Phase 1 and 3-2.

Here's what my NVMe activity looks like specifically in Phase 3-2 iterations since my plotter happened to be in it when I was posting this. It crashes to sometimes single digits. Then ramps back up to 100% in 3-1 iterations. This happens on every NVMe I try. graph

And here is what the activity time looks like once it hits Phase 3-1 iterations

graph2

I get your point. In terms of different performance between 2 SW versions or Win updates, sure. From my experience, 99% what I read about "my CPU is not used as it should be" people forget that there is a limited sustained write rate until a (consumer) flash disk breaks. btw, this drive I have (got it for cheap, would never spend 2k on this) has it's limits as well. It's just with 29PB TBW I'll probably won't have to buy a new drive ever again ;)

In the first picture, how is the CPU load looking like? I can't saturate my 5900X even with this NVMe, so I put another (cheapo) NVMe in I had laying around, doing 2 plots in parallel with half the threads. The CPU is 100% all the time and NVMe's are no longer a bottleneck.

stevekm commented 3 years ago

@localh0rst1337 you'll see from my original post that I'm experiencing this while using a Ryzen 3950X with enterprise grade Intel SSDs. Previously both CPU and SSD reached saturation frequently while plotting. After upgrading the Chia plot software, this is no longer the case.

daveooo11 commented 3 years ago

@localh0rst1337

I do now understand I was wrong to measure by CPU loads. CPU loads will fluctuate in some phases, not really maxing out other than when needed for heavy calculation phases.

It is a bit odd the actual plot drive crashes to such low read and write now especially in Phase 3. Like I mentioned in a prior post, the 0.0.5 version I tried as well is also having an issue on my plotter now. So I am now suspicious its a Windows related issue with some kind of update. I have rolled back 21H1 and some KB updates and trying again.

localh0rst1337 commented 3 years ago

@localh0rst1337 you'll see from my original post that I'm experiencing this while using a Ryzen 3950X with enterprise grade Intel SSDs. Previously both CPU and SSD reached saturation frequently while plotting. After upgrading the Chia plot software, this is no longer the case.

Yes, this is strange. It might really be a problem with the system itself. Unfortunately I have no comparison because I started plotting with Madmax only after 1.2 came out.

@daveooo11 , maybe the blocksize in P3 is changing, doing a lot of inserts with a large number of small blocks, creating an immensely high I/O load, but comparable low throughput. In this case, random IOPS matter, forget the (sequential) throughput. It would be interesting to set up a monitoring with perfmon (Physical Disk IO, Wait Time, Queue length etc) and then compare it with your CPU load. After all, this doesn't explain why it was faster before, but you could spot the bottleneck and investigate further.

endurance1968 commented 3 years ago

if it helps here my AMD Ryzen 9 5900X + 128GB 3600 DRAM + 2TB nVME SSD using 110GB RAMDISK Processor is undervolted 1.05V 4200Mhz to increase lifetime, RAM XMP Profile activated. setting -r 8 -u 256

Phase 1 took 1031.07 sec Phase 2 took 474.573 sec Phase 3 took 565.131 sec Phase 4 took 89.2898 sec Total plot creation time was 2160.17 sec (36.0029 min)

Setup on a Ryzen 9 3900X with 128GB RAM but only running 2400Mhz (othewise instable :() takes roughly 50min. setting -r 6 -u 256

could increase thread but machines are also used fpr office work in parallel. all runing MM 0.1.1 on windows 10

andriusst commented 3 years ago

Guys, it is so easy to confirm if the new version is causing the issue. Just go back to the older version that worked well for you. All older versions are still here on Github availabe for download. https://github.com/stotiks/chia-plotter/releases

daveooo11 commented 3 years ago

Guys, it is so easy to confirm if the new version is causing the issue. Just go back to the older version that worked well for you. All older versions are still here on Github availabe for download. https://github.com/stotiks/chia-plotter/releases

This kind of reply is incredibly ignorant.

1: How can we go back to an older version when only 0.1.1 and up have the NFT plotting? Did previous versions magically get plot to singleton capabilities? No. 2: If you read above, I did go back to a previous version and it still had the same timing issue. Leading me to believe its a windows issue.

Please read the replies before commenting.

I have booted up a linux VM and passed through 16 vCPUs, 8GB of RAM. I cloned the current linux github and ran the program after reformatting the SSD plot drive I am using to Ext4. The timings are much better. And yes I know linux timings are naturally better. But this is like 100% faster (see times below). Phase one is faster, and Phase 3 hitches no longer happen. The SSD usage is always pegged at 100% unlike on windows where it will drop to 10% or less at times in Phase 3-2 iterations.

So in my case, this is 100% a Windows issue, as the Linux version even when passed through a VM is kicking its rear. My timing for my setup on Windows is roughly 75 mins. On Linux even through a vm. 36 mins.

andriusst commented 3 years ago
  1. You are troubleshooting the issue so no need to permanently go back. Just do a quick test, get timing results. Process of elimintation. But you already had the same ignorant idea haven't you?
  2. Yes I saw it but looks like other people are still guessing about potential changes in the new plotter version causing this. So it needed repeating. This is about helping everyone, not just you. I am not seeing such issue so I came with good intentions to offer help. But if you continue with offensive comments then I got better things to do.
stevekm commented 3 years ago

a small update on this, testing out the latest version (a9a490) on Linux with the same hardware gives much better times;

so it seems like this could definitely be a Windows issue. Its not clear what happened in Windows to cause this. Watching htop with the plotter running you can see that in Linux its using much more CPU than Windows was on the latest releases.

Peacemak3r96 commented 2 years ago

Can somebody explain my how it is possible to plott 20tb in one day with madmax?

System from @bways021 Asus ROG Strix TRX40-E | AMD Ryzen Threadripper 3970X | 256GB DDR4 2666

https://docs.google.com/spreadsheets/d/14Iw5drdvNJuKTSh6CQpTwnMM5855MQ46/edit#gid=7029096

if yes could you give my the settings or a tutorial how to do?

regarts :)