Open vvavepacket opened 3 years ago
Avoid using mdadm, there's a software bug somewhere, it seems like it can't handle multiple I/O, If possible, use hardware accelerated RAID0.
Whats the alternative?
On Wed, Jun 16, 2021, 13:09 Adam @.***> wrote:
Avoid using mdadm, there's a software bug somewhere, it seems like it can't handle multiple I/O.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/madMAx43v3r/chia-plotter/issues/518#issuecomment-862557483, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB7JJWQ6TC7FKY5DH5KIVG3TTDLFTANCNFSM46ZQF7XA .
Whats the alternative? … On Wed, Jun 16, 2021, 13:09 Adam @.***> wrote: Avoid using mdadm, there's a software bug somewhere, it seems like it can't handle multiple I/O. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#518 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB7JJWQ6TC7FKY5DH5KIVG3TTDLFTANCNFSM46ZQF7XA .
H/W RAID
hardware raid card
I did fake RAID in the BIOS (X570 Aorus Ultra) and it didn't show in Ubuntu. Disks didn't show the raid disk, just individual disks. Any idea why?
I got this problem as well, but only on 1/5 of the plotting machines. Cleaned memory sticks with 90% alcohol and the problem gone (memory test didn't report anything anyway)
@vvavepacket Do you use external USB SSDs for ur RAID0 with mdadm?
I use Internal NVMEs.
On Wed, Jun 16, 2021, 19:49 Andy Koch @.***> wrote:
@vvavepacket https://github.com/vvavepacket Do you use external USB SSDs for ur RAID0 with mdadm?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/madMAx43v3r/chia-plotter/issues/518#issuecomment-862806389, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB7JJWQXZ3LHLN7TLZU2UQTTTE2APANCNFSM46ZQF7XA .
I use Internal NVMEs. … On Wed, Jun 16, 2021, 19:49 Andy Koch @.***> wrote: @vvavepacket https://github.com/vvavepacket Do you use external USB SSDs for ur RAID0 with mdadm? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#518 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB7JJWQXZ3LHLN7TLZU2UQTTTE2APANCNFSM46ZQF7XA .
Ah okay, then I cannot help, unfortunately :/
What if its USBs? what would be your advise or any tweaks?
[image: Mailtrack] https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5& Sender notified by Mailtrack https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5& 06/16/21, 07:57:05 PM
On Wed, Jun 16, 2021 at 7:56 PM Andy Koch @.***> wrote:
I use Internal NVMEs. … <#m-662601863954108449> On Wed, Jun 16, 2021, 19:49 Andy Koch @.***> wrote: @vvavepacket https://github.com/vvavepacket https://github.com/vvavepacket Do you use external USB SSDs for ur RAID0 with mdadm? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#518 (comment) https://github.com/madMAx43v3r/chia-plotter/issues/518#issuecomment-862806389>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB7JJWQXZ3LHLN7TLZU2UQTTTE2APANCNFSM46ZQF7XA .
Ah okay, then I cannot help, unfortunately :/
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/madMAx43v3r/chia-plotter/issues/518#issuecomment-862808943, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB7JJWXD6NM3DFJUG2E6QU3TTE22HANCNFSM46ZQF7XA .
At the beginning, I had the same errors with a lot of usb drives connected to usb hubs aso.. it was very important to consider usb channels, usb ports (assigned to a certain channel), usb hub devices, kind of usb drives, ..but in your case you use internal drives so you do not have to care about it..
@vvavepacket Do you use external USB SSDs for ur RAID0 with mdadm?
I did something even weirder, trying to raid the hdd with ramdisk to increase hdd performance. it works well and give like 20% performance boost on 23G ramdisk
[root@localhost build]# parted print deGNU Parted 3.1 Using /dev/sda Welcome to GNU Parted! Type 'help' to view a list of commands. (parted) print devices /dev/sda (80.0GB) /dev/sdb (500GB) /dev/sdc (500GB) /dev/sdd (500GB) /dev/sde (500GB) /dev/mapper/centos-home (18.5GB) /dev/mapper/centos-root (38.0GB) /dev/md0 (123GB) /dev/md1 (1901GB) (parted) [1]+ Stopped parted [root@localhost build]# cat /proc/mdstat Personalities : [raid0] md0 : active raid0 ram0[4] sde1[3] sdd1[2] sdc1[1] sdb1[0] 120499200 blocks super 1.2 512k chunks
md1 : active raid0 sdd2[2] sdc2[1] sdb2[0] sde2[3] 1856532480 blocks super 1.2 512k chunks
unused devices:
Do you think I can raid a ramdisk with an HDD then? I have 64GB of ram. Will it offer the same 20% performance improvement?
On Wed, Jun 16, 2021, 20:22 aznboy84 @.***> wrote:
@vvavepacket https://github.com/vvavepacket Do you use external USB SSDs for ur RAID0 with mdadm?
I did something even weirder, trying to raid the hdd with ramdisk to increase hdd performance. it works well and give like 20% performance boost on 23G ramdisk
@. build]# parted │ATOP - localhost 2021/06/17 07:09:39 ---------------- 1s elapsed print deGNU Parted 3.1 │PRC | sys 0.22s | user 1.04s | #proc 171 | #tslpu 4 | #zombie 0 | #exit 0 | Using /dev/sda │CPU | sys 21% | user 107% | irq 0% | idle 208% | wait 64% | ipc 1.47 | Welcome to GNU Parted! Type 'help' to view a list of commands. │cpu | sys 4% | user 31% | irq 0% | idle 57% | cpu003 w 8% | ipc 1.07 | (parted) print devices │cpu | sys 6% | user 26% | irq 0% | idle 13% | cpu000 w 55% | ipc 1.41 | /dev/sda (80.0GB) │cpu | sys 3% | user 28% | irq 0% | idle 69% | cpu001 w 0% | ipc 1.71 | /dev/sdb (500GB) │cpu | sys 7% | user 23% | irq 0% | idle 70% | cpu002 w 0% | ipc 1.74 | /dev/sdc (500GB) │CPL | avg1 7.06 | avg5 7.03 | avg15 6.74 | csw 1259 | intr 1899 | numcpu 4 | /dev/sdd (500GB) │MEM | tot 31.3G | free 537.2M | cache 4.5G | buff 2.1M | slab 520.2M | hptot 0.0M | /dev/sde (500GB) │SWP | tot 0.0M | free 0.0M | swcac 0.0M | | vmcom 554.9M | vmlim 15.6G | /dev/mapper/centos-home (18.5GB) │PAG | scan 9261 | steal 9261 | stall 0 | | swin 0 | swout 0 | /dev/mapper/centos-root (38.0GB) │MDD | md0 | busy 0% | read 163 | write 304 | MBw/s 114.5 | avio 0.0 ns | /dev/md0 (123GB) │DSK | sdb | busy 93% | read 33 | write 62 | MBw/s 26.7 | avio 9.80 ms | /dev/md1 (1901GB) │DSK | sdd | busy 79% | read 34 | write 57 | MBw/s 17.1 | avio 8.70 ms | (parted) ^Z │DSK | sdc | busy 78% | read 36 | write 63 | MBw/s 21.8 | avio 7.83 ms | [1]+ Stopped parted │DSK | sde | busy 76% | read 36 | write 61 | MBw/s 21.5 | avio 7.89 ms | @. build]# cat /proc/mdstat │NET | transport | tcpi 2 | tcpo 1 | udpi 0 | udpo 0 | tcpao 0 | Personalities : [raid0] │NET | network | ipi 2 | ipo 1 | ipfrw 0 | deliv 2 | icmpo 0 | md0 : active raid0 ram0[4] sde1[3] sdd1[2] sdc1[1] sdb1[0] │NET | enp2s0 ---- | pcki 2 | pcko 1 | sp 0 Mbps | si 0 Kbps | so 10 Kbps | 120499200 blocks super 1.2 512k chunks │ │ PID SYSCPU USRCPU RDELAY VGROW RGROW RDDSK WRDSK RUID ST EXC THR S CPUNR CPU CMD 1/4 md1 : active raid0 sdd2[2] sdc2[1] sdb2[0] sde2[3] │ 1484 0.15s 1.02s 1.25s -4K 11324K 77936K 49176K root -- - 20 S 2 117% chia_plot 1856532480 blocks super 1.2 512k chunks │ 1553 0.03s 0.02s 0.00s 0K 0K 0K 0K root -- - 1 R 3 5% atop │ 1665 0.02s 0.00s 0.01s 0K 0K 0K 0K root -- - 1 D 0 2% kworker/u8:1 unused devices: │ 45 0.01s 0.00s 0.00s 0K 0K 0K 0K root -- - 1 S 2 1% kswapd0 @.*** build]#
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/madMAx43v3r/chia-plotter/issues/518#issuecomment-862819788, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB7JJWW6K6O4FTP2EKKG7W3TTE55XANCNFSM46ZQF7XA .
Technically you can, the mdadm works with partition as well, for example make 2 partitions with the size of 40GB, then a ramdisk with 40GB size too, then just make a raid 0 out of them
sudo modprobe brd rd_nr=1 rd_size=41943040 (1024x1024x40)
and then something like
sudo mdadm --create --verbose /dev/md0 --level=0 --raid-devices=3 /dev/sdb2 /dev/sdb3 /dev/ram0 (where sdb2 and sdb3 are the two partitions on the hdd/ssd - can also use ramdisk raid to reduce ssd wear / increase speed, but it's performance is nowhere near the full ram plotting)
and then something like
sudo mkfs -t xfs -f /dev/md0 sudo mkdir -p /mnt/md0 sudo mount /dev/md0 /mnt/md0 sudo chmod 777 /mnt/md0
and then you have a raid0 disk at /mnt/md0
I did fake RAID in the BIOS (X570 Aorus Ultra) and it didn't show in Ubuntu. Disks didn't show the raid disk, just individual disks. Any idea why?
I think it's dmraid that supports this, vs md-raid for linux software raid. you might have to modprobe dmraid, or pass a kernel param like dmraid=true
Technically you can, the mdadm works with partition as well, for example make 2 partitions with the size of 40GB, then a ramdisk with 40GB size too, then just make a raid 0 out of them
At the first glance it seems to be a good idea, but with a deeper understanding of Linux and RAID-0 you will recognize that the idea is not really suitable!
1. You don't need a ramdisk to bring in a performance boost in plotting
sudo sysctl -w vm.vfs_cache_pressure=0
sudo sysctl -w vm.swappiness=0
sudo swapoff -a
watch -n1 "free -h"
2. In RAID-0 the weakest / slowest device dictates the speed of the others
--write-mostly
option in creating your raidI'd like to correct you for the swappiness.. If you have lots of ram, then dont bother configuring or turning off swap since it will never actually used the swap files for any file handles... It will just rely on ram.. but if you get close to the edge, then setting it off might help but at the cost of crashing your system just in case you go over it.
[image: Mailtrack] https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5& Sender notified by Mailtrack https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5& 06/17/21, 06:58:10 AM
On Thu, Jun 17, 2021 at 6:28 AM Andy Koch @.***> wrote:
Technically you can, the mdadm works with partition as well, for example make 2 partitions with the size of 40GB, then a ramdisk with 40GB size too, then just make a raid 0 out of them
At the first glance it seems to be a good idea, but with a deeper understanding of Linux and RAID-0 you will recognize that the idea is not really suitable!
1. You don't need a ramdisk to bring in a performance boost in plotting
- just use the Linux page cache at all for file write/read handles
sudo sysctl -w vm.vfs_cache_pressure=0 sudo sysctl -w vm.swappiness=0 sudo swapoff -a
- from now on, your swap should be disabled and the Linux page cache will use all your available RAM dynamically!
- when starting a new plot, check if the buffer/cache is really used:
watch -n1 "free -h"
2. In RAID-0 the weakest / slowest device dictates the speed of the others
- obviously, a very fast device combined with two slower ones does not make sense at all for RAID-0
- in RAID-1 it would be make sense, there you would have also the --write-mostly option in creating your raid
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/madMAx43v3r/chia-plotter/issues/518#issuecomment-863124129, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB7JJWURWXEKZURSRFSRE6DTTHE4DANCNFSM46ZQF7XA .
that's right, if you have more RAM than the task requires, then you don't have to care about swapping. I expected that not everyone here have at least 128GB of RAM available in a single system ;)
btw: if you really really really want to have more control about your RAM resources explicitly, then use a LVM cache which is assigned to a LV. It works quite well, when you use your system also for other things which requires file writes / reads ops. So, it could be beneficial in such cases.
(a German tutorial, sry^^, instead of using a ssd cache you can use ur ramdisk) https://www.thomas-krenn.com/de/wiki/LVM_Caching_mit_SSDs_einrichten
EDIT: use write-back mode instead of writethrough mode!
+1 mdadm makes my pc crash after a few hours. pity.
Whats a good alternative to mdadm?
Hardware raid is not an option :/
[image: Mailtrack] https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5& Sender notified by Mailtrack https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5& 06/17/21, 09:53:28 AM
On Thu, Jun 17, 2021 at 9:53 AM therealflinchy @.***> wrote:
+1 mdadm makes my pc crash after a few hours. pity.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/madMAx43v3r/chia-plotter/issues/518#issuecomment-863258620, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB7JJWXVCGSH5Z4HTQ5CAI3TTH44PANCNFSM46ZQF7XA .
Everyone overlooks it:
sudo mkfs.f2fs -f \
-l f2fs-collection \
-O extra_attr,inode_checksum,sb_checksum \
/dev/sdd \
-c /dev/sde \
-c /dev/sdf \
-c /dev/sdg \
-c /dev/sdh \
-c /dev/sdi \
-c /dev/sdj \
-c /dev/sdk
In the above command, you are setting /dev/sdd
as the meta device, which is the device that you mount. You ignore the others.
From man mkfs.f2fs
-c device-list Build f2fs with these additional comma separated devices, so that the user can see all the devices as one big volume. Supports up to 7 devices except meta device.
The part about passing the devices in a comma separated list is actually incorrect. You have to do it as I had shown.
An f2fs disk collection is basically RAID0, without mdadm, except that it doesn't care what size the individual volumes are, or even what those volumes are. You can combine a HDD, a hardware RAID0 volume, an mdadm volume, SSDs, and thumb drives into a single volume. Not that there'd ever be a reason to do that, as it would perform horribly, but there's no function stopping you from doing it.
I've been running an f2fs disk collection instead of mdadm purely due to getting better IOPS.
It has several quirks of its own, however. It doesn't like it when you build a volume under one kernel version, then switch to an older kernel. Requires some experimenting.
Bonus f2fs fact!
I picked up 14 10TB HDDs for $100 each - last month - because they were host-managed SMR zoned drives. Not gonna lie, I had no idea how to use these, or how they work, but sudo mkfs.f2fs -m /dev/sdbx
and mount it as any normal HDD.
-m
Specify f2fs filesystem to supports the block zoned feature. Without it, the filesystem doesn't support the feature.
Ive been using mdadm and formatted it as f2fs and performance is good.
Yiu are saying f2fs can run raid on it withoutdadm and is even better.
Do you have benchmarks to back this up? How many mins of plotting time did you save?
On Fri, Jun 18, 2021, 03:36 William Allen @.***> wrote:
Everyone overlooks it:
sudo mkfs.f2fs -f \ -l f2fs-collection \ -O extra_attr,inode_checksum,sb_checksum \ /dev/sdd \ -c /dev/sde \ -c /dev/sdf \ -c /dev/sdg \ -c /dev/sdh \ -c /dev/sdi \ -c /dev/sdj \ -c /dev/sdk
In the above command, you are setting /dev/sdd as the meta device, which is the device that you mount. You ignore the others.
From man mkfs.f2fs
-c device-list Build f2fs with these additional comma separated devices, so that the user can see all the devices as one big volume. Supports up to 7 devices except meta device.
The part about passing the devices in a comma separated list is actually incorrect. You have to do it as I had shown.
An f2fs disk collection is basically RAID0, without mdadm, except that it doesn't care what size the individual volumes are, or even what those volumes are. You can combine a HDD, a hardware RAID0 volume, an mdadm volume, SSDs, and thumb drives into a single volume. Not that there'd ever be a reason to do that, as it would perform horribly, but there's no function stopping you from doing it.
I've been running an f2fs disk collection instead of mdadm purely due to getting better IOPS.
It has several quirks of its own, however. It doesn't like it when you build a volume under one kernel version, then switch to an older kernel. Requires some experimenting.
Bonus f2fs fact! I picked up 14 10TB HDDs for $100 each - last month - because they were host-managed SMR zoned drives. Not gonna lie, I had no idea how to use these, or how they work, but sudo mkfs.f2fs -m /dev/sdbx and mount it as any normal HDD.
-m Specify f2fs filesystem to supports the block zoned feature. Without it, the filesystem doesn't support the feature.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/madMAx43v3r/chia-plotter/issues/518#issuecomment-863827833, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB7JJWWFSTLU4MLAFE5OCHLTTLZRLANCNFSM46ZQF7XA .
Yes. I had a bunch of stuff written here, but I accidentally closed the tab.. but I've been plotting since December, and I'll have over 2k plots in a few hours. I've basically been testing various things the whole time. Just on its own, f2fs is pretty fast. I don't have any screenshots of my f2fs disk collection benchmarks though.
Then recently, I discovered the speed benefits of a well-tuned XFS filesystem is on a hardware RAID0 array. XFS is what I use now, and has given me the best performance. You can probably see here where I switched to xfs. And then I ran the multithreaded chiapos
This plotter turns everything on its head. Hardly needs any space at all, so my 11.5TB RAID0 SSD array I was using as -t with 24// plots over 48 cores is now quiet, and I just do 1 plot at a time. Not using a ram tmpfs because I get slower times for whatever reason. Fastest time I've gotten is ~27m
If your -t is your bottleneck, and you want an alternative to mdadm, make a f2fs disk collection and run a plot and see what you get. Phoronix has some benchmark data comparing it with mdadm and various other filesystems.
Not sure if your "bench-chia" script uses "direct access" to your drives or if it uses also Linux page cache (RAM) for a certain time to handle the writes and reads? (I often use the dd tool with oflag=direct, which gives me reliable results).
XFS has some nice features, the most important for RAID users is the automatic alignment of physical, chunk and logical sizes. So, you don't have to care about the stride and stripe width, in most cases (not always!!). (https://wiki.archlinux.org/title/RAID#Calculating_the_stride_and_stripe_width)
Technically you can, the mdadm works with partition as well, for example make 2 partitions with the size of 40GB, then a ramdisk with 40GB size too, then just make a raid 0 out of them
At the first glance it seems to be a good idea, but with a deeper understanding of Linux and RAID-0 you will recognize that the idea is not really suitable!
1. You don't need a ramdisk to bring in a performance boost in plotting
* just use the Linux page cache at all for file write/read handles
sudo sysctl -w vm.vfs_cache_pressure=0 sudo sysctl -w vm.swappiness=0 sudo swapoff -a
* from now on, your swap should be disabled and the Linux page cache will use all your available RAM dynamically! * when starting a new plot, check if the buffer/cache is really used:
watch -n1 "free -h"
2. In RAID-0 the weakest / slowest device dictates the speed of the others
* obviously, a very fast device combined with two slower ones does not make sense at all for RAID-0 * in RAID-1 it would be make sense, there you would have also the `--write-mostly` option in creating your raid
Chia plotting does gain any benefit from disk caching, if you really want to know how the plotting process works, observe it with FileActivityWatch. While it is true that raid0 a very fast device with a very slow device will slow the overall performance, but it still faster than the slow device. I managed to reduce plotting time from 3 hours to 2 hours on my ssd potatoes, and something like 20% with the hdd potatoes. Also you can configure mdadm to work in linear mode, like using the first 40/110GB from a ramdisk, and 70/110GB from a SSD.
Technically you can, the mdadm works with partition as well, for example make 2 partitions with the size of 40GB, then a ramdisk with 40GB size too, then just make a raid 0 out of them
At the first glance it seems to be a good idea, but with a deeper understanding of Linux and RAID-0 you will recognize that the idea is not really suitable! 1. You don't need a ramdisk to bring in a performance boost in plotting
* just use the Linux page cache at all for file write/read handles
sudo sysctl -w vm.vfs_cache_pressure=0 sudo sysctl -w vm.swappiness=0 sudo swapoff -a
* from now on, your swap should be disabled and the Linux page cache will use all your available RAM dynamically! * when starting a new plot, check if the buffer/cache is really used:
watch -n1 "free -h"
2. In RAID-0 the weakest / slowest device dictates the speed of the others
* obviously, a very fast device combined with two slower ones does not make sense at all for RAID-0 * in RAID-1 it would be make sense, there you would have also the `--write-mostly` option in creating your raid
Chia plotting does gain any benefit from disk caching, if you really want to know how the plotting process works, observe it with FileActivityWatch. While it is true that raid0 a very fast device with a very slow device will slow the overall performance, but it still faster than the slow device. I managed to reduce plotting time from 3 hours to 2 hours on my ssd potatoes, and something like 20% with the hdd potatoes. Also you can configure mdadm to work in linear mode, like using the first 40/110GB from a ramdisk, and 70/110GB from a SSD.
I used "dstat" tool to check the final writes to my hdd RAID-0, and I used "free -h" to check what really happend with my RAM (page cache), so with the new madmax plotter I could achieve 44mins for a single plot.
The reason why you get a little(!) performance boost is that, that in RAID-0 the effort for each device will decrease with the amount of devices in the RAID-0. Using a RAM drive in this way is highly inefficient, you can have better performance in using LVM cache or Linux page cache.
The linear mode of mdadm sounds bad, seems so that you are using just only a device at the same time, where is the benefit!?
EDIT: my specs: Intel 11900, 64GB 3200 RAM, 10x external USB HDDs with mdadm RAID-0 as XFS (with normal mount options, not tuned!) LinuxMint 20.1, Kernel 5.10 OEM, Linux Page Cache
Do you have a way to see the actual "hit" rate of linux cache thing ? last time i checked using primo cache in windows with ~20GB cache, the actual hit rate duing plotting process is below 0.1% (and yes, primocache does increase plotting performance by maximizing hdd activities). The ramdisk raid doesn't seem bad as you said, with a single HDD/SSD you can get double read/write performance, the more HDDs you add, the less performance you gain ... and it is true that 10+ HDD raid with a single ramdisk, you gain pretty much nothing :D
As I said, I just used the "free -h" command, observing the column "buff/cache" in a "watch -n1" process. At the beginning of plotting, it will increase by ~2gb/s up to the maximal, available RAM (in my case ~62GB, because I use another page cache regime then the standard one!!). The nice thing is, that a page cache reacts on the available memory dynamically, so if the plotting process needs more RAM then it can get it and the page cache decreases dynamically and vice versa. (Btw: In Linux when using the Gnome system monitor you will not see the page cache.)
example: "free -h" with vm.vfs_cache_pressure=0 by using "sysctl" :
atop+htop+lm_sensors+hddtemp ... average buckets size for tables in each phase are ~100GB, except for P1T1 at ~70GB, any caches that smaller than the buckets size won't get hit because it's constantly update with new buckets.
You are confusing me :D that's good, offers a chance to learn! :D
You mentioned that the cache is not hit? Maybe it's because it's not observed by atop? Let's look into MAN of atop... "(...) the memory for the page cache ('cache' and 'buff' in the MEM-line) is not implied! (...)" I think that you cannot see/measure hits with your method.. and your argument is a little bit inconsistent (to me).. when new buckets are written to the page cache, then it's hit, right?
You are confusing me :D that's good, offers a chance to learn! :D
You mentioned that the cache is not hit? Maybe it's because it's not observed by atop? Let's look into MAN of atop... "(...) the memory for the page cache ('cache' and 'buff' in the MEM-line) is not implied! (...)" I think that you cannot see/measure hits with your method.. and your argument is a little bit inconsistent (to me).. when new buckets are written to the page cache, then it's hit, right?
I don't have a way see the amount of cache hit on linux, but it's not important, let's have a look how the plotting process works, for example with 32 buckets :
There isn't a caching software smart enough to handle the plotting process, not if the cache size smaller than the total buckets size of each phase. They can't keep the generated bucket inside the memory until it gets hit, because some of the generated data never get hit and they stuck with it. If they update the cache with new generated buckets, then it also never gets hit because the data that gets to read is > 100GB behind the last data was written.
There isn't a caching software smart enough to handle the plotting process
That's right, but it does not have to be smart to get a better benefit. It's a write-back cache in my setup.
At P1T1 : the plotting software generates 32 buckets with ~2400MB each, or 76800MB
So the first ~62000MB are written to the page cache with >2gb/s, after reaching the available RAM the page cache writes it to my RAID-0 with about 1.2gb/s. Sometimes, old buckets in the page cache are removed, because they are not used anymore. The page cache receives again new writes with 2gb/s (instead of 1.2gb/s), because RAM is available again... aso.
For my setup it's more beneficial than using a ramdisk of 62gb. The results are great with 44min instead of ~2h.
@andyvk85 Did you try difference writeback cache size ? in my potatoes it getting worse when i increase or decrease the cache size, may be i'll disable the ramdisk and give it another try ...
the default settings is 20% of the free memory
echo 20 > /proc/sys/vm/dirty_ratio echo 10 > /proc/sys/vm/dirty_background_ratio
I'm on LinuxMint 20.1 and I don't use swap files / partitions and an absolute aggressive page cache by doing the following:
sudo sysctl -w vm.vfs_cache_pressure=0
sudo sysctl -w vm.swappiness=0 # I know this is not really required if you don't have swap areas!
and then please set the final plot dir to another drive
would be nice to hear from you again after a test.. I came up with antoher idea, what will happen when k32 is not anymore valid for the chia network.. then you would need a lot more of RAM to use a ramdisk to handle k33/34 right? So, this solution with an aggressive page cache could be applied too..
@andyvk85 The test is still running but it seems better than the ramdisk thing, cpu utilization level is higher, table 2/3 of phase 1 finished 1 or 2 minutes faster .... Maybe you just saved my day .... too bad a lot of my brain cells fried for the ramdisk thing and it didn't work as expected :(
That's sounds great for you! :D Some months ago, I started also with different kinds of ramdisks (like rapiddisk) aso.. So, I did a lot of research and I believe it's a better solution ;) I'm happy for you! Do u can post a comparision of the plots here pls, so everyone is informed ;)
nch-chia" script uses "direct access" to your drives or if it uses also Linux page cache (RAM) for a certain time to handle the writes and reads? (I often use the dd tool with oflag=direct, which gives me reliable results).
https://github.com/wallentx/farm-and-ranch-supply-depot/blob/main/extra/bench-chia which is just an execution of a fio "profile" taken from https://github.com/Chia-Network/chia-blockchain/wiki/Storage-Benchmarks @jmhands mentioned that he has an even better version of this that closer simulates the plot creation cycle, but I haven't seen him share it anywhere.
XFS has some nice features, the most important for RAID users is the automatic alignment of physical, chunk and logical sizes. So, you don't have to care about the stride and stripe width, in most cases (not always!!).
For me, it was actually the attention put toward specifying the details of the underlying stripe width and stripe unit that made all the difference.
nch-chia" script uses "direct access" to your drives or if it uses also Linux page cache (RAM) for a certain time to handle the writes and reads? (I often use the dd tool with oflag=direct, which gives me reliable results).
https://github.com/wallentx/farm-and-ranch-supply-depot/blob/main/extra/bench-chia which is just an execution of a fio "profile" taken from https://github.com/Chia-Network/chia-blockchain/wiki/Storage-Benchmarks @jmhands mentioned that he has an even better version of this that closer simulates the plot creation cycle, but I haven't seen him share it anywhere.
XFS has some nice features, the most important for RAID users is the automatic alignment of physical, chunk and logical sizes. So, you don't have to care about the stride and stripe width, in most cases (not always!!).
For me, it was actually the attention put toward specifying the details of the underlying stripe width and stripe unit that made all the difference.
the XFS tuning is interesting! thanks for sharing!
@andyvk85, can you check the amount of memory that actually used by write-back cache ?
using this command
watch grep -e Dirty: -e Writeback: /proc/meminfo
Also, can you check the write-back parameters on your plotting machine ?
cat /proc/sys/vm/dirty_ratio
cat /proc/sys/vm/dirty_background_ratio
cat /proc/sys/vm/dirty_expire_centisecs
cat /proc/sys/vm/dirty_writeback_centisecs
I use these, telling the system that it's fine to delay disk write for ~10 minutes.
sudo sysctl -w vm.dirty_ratio=100
sudo sysctl -w vm.dirty_background_ratio=60
sudo sysctl -w vm.dirty_expire_centisecs=60000
sudo sysctl -w vm.dirty_writeback_centisecs=500
Anyway my potatoes performance isn't stable with either centos / ubuntu, i ended up with tinycore linux and got ~1000s reduced ... they used weird IO scheduler but it seems more stable, and this linux distro didn't use hdd at all. It bothers me a lot when tmux border didn't drawn properly ....
EDIT : Potatoes spec : i5-3570 / 32GB RAM @1333MHz / 4x500GB WD
@andyvk85, can you check the amount of memory that actually used by write-back cache ?
using this command
watch grep -e Dirty: -e Writeback: /proc/meminfo
Also, can you check the write-back parameters on your plotting machine ?
cat /proc/sys/vm/dirty_ratio
cat /proc/sys/vm/dirty_background_ratio
cat /proc/sys/vm/dirty_expire_centisecs
cat /proc/sys/vm/dirty_writeback_centisecs
I use these, telling the system that it's fine to delay disk write for ~10 minutes.
sudo sysctl -w vm.dirty_ratio=100
sudo sysctl -w vm.dirty_background_ratio=60
sudo sysctl -w vm.dirty_expire_centisecs=60000
sudo sysctl -w vm.dirty_writeback_centisecs=500
Anyway my potatoes performance isn't stable with either centos / ubuntu, i ended up with tinycore linux and got ~1000s reduced ... they used weird IO scheduler but it seems more stable, and this linux distro didn't use hdd at all. It bothers me a lot when tmux border didn't drawn properly ....
EDIT : Potatoes spec : i5-3570 / 32GB RAM @1333Mhz / 4x500GB WD
Hi! Yes, I aslo used these "dirty_*" params finally! They have a great impact on my system, my params: sudo sysctl -w vm.dirty_ratio=99 sudo sysctl -w vm.dirty_background_ratio=99 sudo sysctl -w vm.dirty_expire_centisecs=25000 sudo sysctl -w vm.dirty_writeback_centisecs=25000 # 250secs, setting above the step in plotting that takes most time (it's only my guess!)
I also experienced some instabilities in the plotting process.. after ~8 single plots I got a read/write error, but the raid-0 was working well at this moment, so maybe I have to set the ratios to a lower level..
I will check the "watch grep -e Dirty: -e Writeback: /proc/meminfo" command later.
I have 2 SSD, and using them individually (2 parallel madmax plots), it completes
I raided them in Raid 0 using Ubuntu software raid (mdadm), but it seems I'm getting this error once in a while (random)..
People say drive is dying... But if its dying, how come it works just fine individually?
People say it has to do something with trim.. Should we disable trim? How do we do proper trim on Ubuntu for raided devices?