madMAx43v3r / chia-plotter

Apache License 2.0
2.27k stars 664 forks source link

Large loss of speed with sequential plotting #580

Open RayJuuzou opened 3 years ago

RayJuuzou commented 3 years ago

Speed issue when creating more than one plot at a time. The first plot is created in about 30 minutes, subsequent plots are created much slower than the previous ones. After a couple of hours, the plotter starts creating a plot for 4-5 hours. But, if I indicate in the plotter the creation of only one plot and manually run them sequentially one after another, then everything is fine with speed.

PS, When starting the program in a while, the plotter does not behave stably (bitfield error)

An example of creating two plots in sequence: Multi-threaded pipelined Chia k32 plotter - [hide] Final Directory: /HDD105/ Number of Plots: 20 Crafting plot 1 out of 20 Process ID: 3876 Number of Threads: 24 Number of Buckets P1: 2^8 (256) Number of Buckets P3+P4: 2^8 (256) Pool Public Key: [PRIVATE] Farmer Public Key: [PRIVATE] Working Directory: /NVME980PRO/ Working Directory 2: /NVME980PRO/ Plot Name: [PLOT] [P1] Table 1 took 11.3781 sec [P1] Table 2 took 87.5156 sec, found 4294984855 matches [P1] Table 3 took 172.109 sec, found 4294965013 matches [P1] Table 4 took 173.581 sec, found 4295003067 matches [P1] Table 5 took 161.693 sec, found 4295021192 matches [P1] Table 6 took 136.698 sec, found 4295095023 matches [P1] Table 7 took 97.9706 sec, found 4294996197 matches Phase 1 took 841.254 sec [P2] max_table_size = 4295095023 [P2] Table 7 scan took 8.25096 sec [P2] Table 7 rewrite took 45.6669 sec, dropped 0 entries (0 %) [P2] Table 6 scan took 29.0268 sec [P2] Table 6 rewrite took 43.7142 sec, dropped 581391798 entries (13.5362 %) [P2] Table 5 scan took 26.0034 sec [P2] Table 5 rewrite took 42.0878 sec, dropped 761983876 entries (17.7411 %) [P2] Table 4 scan took 25.4672 sec [P2] Table 4 rewrite took 41.3989 sec, dropped 828836490 entries (19.2977 %) [P2] Table 3 scan took 26.3934 sec [P2] Table 3 rewrite took 41.0317 sec, dropped 855015925 entries (19.9074 %) [P2] Table 2 scan took 27.4807 sec [P2] Table 2 rewrite took 40.9675 sec, dropped 865561769 entries (20.1528 %) Phase 2 took 405.389 sec Wrote plot header with 268 bytes [P3-1] Table 2 took 47.1309 sec, wrote 3429423086 right entries [P3-2] Table 2 took 30.8602 sec, wrote 3429423086 left entries, 3429423086 final [P3-1] Table 3 took 53.6269 sec, wrote 3439949088 right entries [P3-2] Table 3 took 32.5438 sec, wrote 3439949088 left entries, 3439949088 final [P3-1] Table 4 took 55.1271 sec, wrote 3466166577 right entries [P3-2] Table 4 took 34.7845 sec, wrote 3466166577 left entries, 3466166577 final [P3-1] Table 5 took 57.5313 sec, wrote 3533037316 right entries [P3-2] Table 5 took 38.269 sec, wrote 3533037316 left entries, 3533037316 final [P3-1] Table 6 took 62.5313 sec, wrote 3713703225 right entries [P3-2] Table 6 took 41.1008 sec, wrote 3713703225 left entries, 3713703225 final [P3-1] Table 7 took 86.6597 sec, wrote 4294996197 right entries [P3-2] Table 7 took 55.5494 sec, wrote 4294967296 left entries, 4294967296 final Phase 3 took 598.012 sec, wrote 21877246588 entries to final plot [P4] Starting to write C1 and C3 tables [P4] Finished writing C1 and C3 tables [P4] Writing C2 table [P4] Finished writing C2 table Phase 4 took 33.4826 sec, final plot size is 108835907958 bytes Total plot creation time was 1878.18 sec (31.3031 min) Started copy to /HDD105/[PLOT] Crafting plot 2 out of 20 Process ID: 3876 Number of Threads: 24 Number of Buckets P1: 2^8 (256) Number of Buckets P3+P4: 2^8 (256) Pool Public Key: [PRIVATE] Farmer Public Key: [PRIVATE] Working Directory: /NVME980PRO/ Working Directory 2: /NVME980PRO/ Plot Name: [PLOT] [P1] Table 1 took 31.1952 sec [P1] Table 2 took 124.547 sec, found 4294837776 matches [P1] Table 3 took 201.812 sec, found 4294705704 matches [P1] Table 4 took 193.163 sec, found 4294519115 matches [P1] Table 5 took 191.393 sec, found 4294090255 matches [P1] Table 6 took 177.952 sec, found 4293272641 matches [P1] Table 7 took 146.318 sec, found 4291606317 matches Phase 1 took 1066.63 sec [P2] max_table_size = 4294967296 [P2] Table 7 scan took 33.0268 sec [P2] Table 7 rewrite took 89.4004 sec, dropped 0 entries (0 %) [P2] Table 6 scan took 34.304 sec [P2] Table 6 rewrite took 81.0639 sec, dropped 581486169 entries (13.5441 %) [P2] Table 5 scan took 33.398 sec [P2] Table 5 rewrite took 75.7063 sec, dropped 762221169 entries (17.7505 %) [P2] Table 4 scan took 34.1174 sec [P2] Table 4 rewrite took 74.7649 sec, dropped 829020100 entries (19.3041 %) [P2] Table 3 scan took 34.3403 sec [P2] Table 3 rewrite took 71.1653 sec, dropped 855135850 entries (19.9114 %) [P2] Table 2 scan took 34.1309 sec [P2] Table 2 rewrite took 65.1987 sec, dropped 865636413 entries (20.1553 %) Phase 2 took 668.726 sec Wrote plot header with 268 bytes [P3-1] Table 2 took 94.9269 sec, wrote 3429201363 right entries [P3-2] Table 2 took 95.735 sec, wrote 3429201363 left entries, 3429201363 final [P3-1] Table 3 took 123.91 sec, wrote 3439569854 right entries Copy to /HDD105/[PLOT] finished, took 2112.06 > [P3-2] Table 3 took 104.703 sec, wrote 3439569854 left entries, 3439569854 final [P3-1] Table 4 took 128.098 sec, wrote 3465499015 right entries [P3-2] Table 4 took 123.159 sec, wrote 3465499015 left entries, 3465499015 final [P3-1] Table 5 took 146.84 sec, wrote 3531869086 right entries [P3-2] Table 5 took 127.468 sec, wrote 3531869086 left entries, 3531869086 final [P3-1] Table 6 took 177.877 sec, wrote 3711786472 right entries [P3-2] Table 6 took 159.02 sec, wrote 3711786472 left entries, 3711786472 final [P3-1] Table 7 took 220.342 sec, wrote 4291606317 right entries [P3-2] Table 7 took 165.567 sec, wrote 4291606317 left entries, 4291606317 final Phase 3 took 1670.64 sec, wrote 21869532107 entries to final plot [P4] Starting to write C1 and C3 tables [P4] Finished writing C1 and C3 tables [P4] Writing C2 table [P4] Finished writing C2 table Phase 4 took 40.2463 sec, final plot size is 108789625500 bytes Total plot creation time was 3446.28 sec (57.438 min)

TRIM every 10 minutes.

Hardware: CPU: r9 5900x RAM: 64gb 3200 G.Skill (CL18, overclock) TMP dir: Samsung 980 pro (In PCI v3 mode) Destination folder: simple hdd(~180-200mb/s)

chinabala commented 3 years ago

Happened to me too... :(

JSGiorno commented 3 years ago

The same here.

Qwinn1 commented 3 years ago

Wondering if this isn't the cause:

"Copy to /HDD105/[PLOT] finished, took 2112.06"

First plot, there's no copy from your NVME to your hard drive to slow things down. Second and subsequent plots, your NVME is being frequently interrupted with the copy to your HD. Consider having a stage folder on your NVME (or just don't specify a -d final directory and let the final destination be tmpdir), and see if a cron job handling the copy from your NVME to your HDD contends with the plotting less.

Trim every 10 minutes seems way too frequent IMO btw. On my NVME's I notice no significant performance degration without TRIM for a good 6 hours at least. And I only TRIM via fstrim in the cron, no discard mount option being used.

ataa commented 3 years ago

Plot transfer to your Destination drive "HDD105" is too slow ~50MB/s? Is it Network/external drive? What is your Plotter version?

RayJuuzou commented 3 years ago

Plot transfer to your Destination drive "HDD105" is too slow ~50MB/s? Is it Network/external drive? What is your Plotter version?

Version last. 50MB/s just with plotter. avg 180-200mb/s without plotter.

Qwinn1 commented 3 years ago

Me, I'm not staging plots within the plotter, I use the cron to move (confirm available drive space && rename .plot to .qwinn && move to HDD && rename back to .plot) them directly from the NVME plotting directories to internal HDDs on Ubuntu 20.04. I do farm the plots with a Windows 10 GUI 1.0Ghz over an otherwise 2.5Ghz network running on all Cat8 cable. 64GB memory on all 3 plotting machines, no ramdisk use. Getting ~31 minute plots with Madmax on all 3 machines with no real performance difference between 1 4TB enterprise insane TBW NVME vs 2 machines using two 2 TB consumer meh TBW NVMEs, so producing around 132 plots per day. My prior best effort on other plotters was around 96 per day with ~10 plots in parallel using plotman and pechy's latest optimized chiapos.

Installed madmax plotter yesterday.

Hardcore-fs commented 3 years ago

Just wondering if it is the SSD having to recycle the used blocks...

When you first enter the plotter, the SSD has done most of the related housekeeping to cleaning blocks in the device tidying/ compressing. etc

as you plot , the SSD MUST move shit about to free up blocks so they can be erased. SSD are arranged as "pages" & blocks.... a page being similar to a "sector" or disk block, an SSD "block" is not a disk "block" you CANNOT erase a "page", the ssd controller has to move al the shit about, to ensure blocks are as full as possible with active stuff, then it has to free up a complete BLOCK to erase, before it can allocate any new pages in that BLOCK. the faster you plot.. the less time the controller has to do its shit.

Then with MLC's there is a "cool down time", if you try to write to an MLC chip right after erase, it can shorten the life and don't even get started about writing pages next to each other on the chip..... , so the flash controllers have all sorts of "special" tricks... They may have what appears to be stellar transfer rates.. but if you catch the controller correctly you can bury those figures...

DiscordMJ commented 3 years ago

In my view, this has less to do with the plotter itself or an SSD and more with the file system. I am having the same issue and I do not plot on an SSD but on an 8 disk raid0 ( initially a throughput of 600MB / sec ), On the second plot it is getting slower and for third the speeds become abysmal. My current solution is a rather kludgy one: I have a script that loops for n interations and does the following:

gabriellee82 commented 3 years ago

I am plotting to RAMDrive and alternating between 2 NVME, and using a script to copy the completed plots to HDD, after which, the drive is trimmed. Able to avoid the loss of speed with sequential plotting as encountered with sequential plotting on the same drive. Windows BTW.

RayJuuzou commented 3 years ago

In my view, this has less to do with the plotter itself or an SSD and more with the file system. I am having the same issue and I do not plot on an SSD but on an 8 disk raid0 ( initially a throughput of 600MB / sec ), On the second plot it is getting slower and for third the speeds become abysmal. My current solution is a rather kludgy one: I have a script that loops for n interations and does the following:

  • mkfs.ext4 the raid 0
  • create exactly one plot
  • repeat Now, while this is getting the job done, I would really like to know where the ext4 performance degredation is coming from ...

When i use Madmax plotter in while(php/bash), i get error about bitfield. But my RAM work without error(Many hours of testing after overclocking).

nothinglastsforever commented 3 years ago

Needs a staggered start option to mitigate this without having to code your own script.

ataa commented 3 years ago

I just checked the logs on my plotters and the speed loss due to transfer was really low (5 to 15 seconds). not sure if the loss of speed reported above is related to source/destination file system type or something else. These plotters have 16GB ram, 1TB Samsung consumer nvme for both temps, filesystem on nvme and hdds are XFS with CRC off and I don't use the discard option. OS is optimized Ubuntu server 21.04 minimal. I ran the plotter with nice -n -10. Plot transfer is done by MadMax  to WD Gold 10TB HDD and the transfer speed reported by MadMax is ~225MB/s.