Closed mohsenrajeh closed 1 year ago
GPU plotting need FAST ssd. or disk write fail maybe occur. there are few ways to make your SSD faster.
1.) go enterprise ssd. ( this is no brainer's best choice. lol ).
2.) build multiple ssd as raid0. ( cheaper but need more ssd ).
eg.
sudo mkfs.btrfs -n 64k -d raid0 -m raid0 /dev/nvme0n1 /dev/nvme1n1 /dev/nvme2n1 /dev/nvme3n1 -f
sudo mount /dev/nvme0n1 -o ssd,noatime.discard.autodefrag /mnt/your_tmp_path
sudo chown -R your_linux_acct /mnt/your_tmp_path
3.) slow down your GPU to match your ssd 's capability. eg, only run one thread for GPU and 4 cpu core.( you can fine turning this to sweet point.
./arc_plot -G --gthreads 1 -r 4 ...
4.) follow some best practice.
move out plot file from temp drive before next plotting runs. fstrim your temp_drive everytime before next plotting runs.
some script like this helps.
#!/bin/bash
for i in {1..10}
do
# just run one plot and leave final plot file in tmp drive.
echo "start plot " ${i}
./arc_plot -r 32 -G -n 1 -c -f -t /mnt/RAID/ -d /mnt/RAID
# sleep few second
sleep 3
# move plot file to destination.
echo "move plot " ${i}
mv /mnt/RAID/*.plot /mnt/DYSK/
sleep 3
# clean up tmp drive for next plot.
echo "clean up tmp drive"
rm /mnt/RAID/* ; sudo fstrim /mnt/RAID ;
done
thanks for reply ok i choose the raid 2 980pro now its working for 10 plot if have any err send here
after 50 plot this error Segmentation fault
slowdown the gpu and cpu not working after change the setting for slowdown get error on phase 2 every time but default setting still making the plot on random time
do you move out plot file from tmp drive? how many gigabyte plot file left in your tmp drive? what's your copy out speed from tmp drive to dest path?
As occupation rate increase on your tmp drive , it will slow down it gradually. so you must copy out asap and do fstrim on you tmp drive.
if you 3090 plot speed is 9 min. 108G plot file need copy out in at least 9 min, 540 sec, 108G / 540 sec = 200 M / sec ( minimum speed you must have ) . so if you use single HDD as destination, your tmp drive will pile up and eventually slow down your tmp drive and crash it. it's better for you build a btrfs raid0 ( two HDD ) as destination to receive your plot. farming on two hdd raid0 is pretty stable also. Or you have to structure your ploting system add a staging disk array as middle layer to accept your plot file before it move out, usually it's hdd raid( 4 HDD ) or high capacity SSD like 8T.
yes move out to hdd with 250 mb speed .use 2 tb temp raid 2 980pro . it work abut 40 or 50 plot suddenly close or print this error make plot on 9.5 min but copy take 11.5 min after plot make wait to copy finshe after that make new plot
syetem gpu-rtx3090 msi cpu-3900x ssd-mp600xtpro mb-x570 ram-128-3600
error1
Phase 1: creating plot
[P1] Table 1 took 9.12832 sec [P1] Table 2 took 22.3675 sec, found 4294726263 matches terminate called after throwing an instance of 'std::runtime_error' what(): thread failed with: fwrite() failed with: Bad address Aborted
error 2 Segmentation fault
and my plotting time in 9 min and problem is copy from ssd to hdd is 11.5 min is there any solution for this ?