swar / Swar-Chia-Plot-Manager

This is a Cross-Platform Plot Manager for Chia Plotting that is simple, easy-to-use, and reliable.
GNU General Public License v3.0
1.26k stars 388 forks source link

Plots mysteriously vanish from job #918

Open JonathanGorr opened 3 years ago

JonathanGorr commented 3 years ago

Hi,

Once I get above 6 or 7 plots, the 7th or 8th plot will enter stage 1, be worked until about 10% or less, then vanish! This is really a waste of electricity and I can't understand why it happens. I want to run 12-15 concurrently, but this makes it impossible. I am not maxxed out on threads, space or ram FYI. Here is my job:

max_concurrent: 12
max_for_phase_1: 4
minimum_minutes_between_jobs: 40

- name: ssd-1
    max_plots: 999
    farmer_public_key:
    pool_public_key:
    temporary_directory: 
    - M:\Plotter
    - S:\Plotter
    destination_directory: O:\Plots
    size: 32
    bitfield: true
    threads: 24
    buckets: 128
    memory_buffer: 3000
    max_concurrent: 12
    max_concurrent_with_start_early: 12
    initial_delay_minutes: 0
    stagger_minutes: 60
    max_for_phase_1: 2
    concurrency_start_early_phase: 4
    concurrency_start_early_phase_delay: 0
    temporary2_destination_sync: false
    exclude_final_directory: false
    skip_full_destinations: true
    unix_process_priority: 10
    windows_process_priority: 256
    enable_cpu_affinity: false
    cpu_affinity: [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 ]
JonathanGorr commented 3 years ago

I thought threads meant total, not per plot. Reducing it to see if that fixes it.

JonathanGorr commented 3 years ago

Reducing the threads to 2 or 4 didnt fix it.

JonathanGorr commented 3 years ago

There might be something wrong with my ram, as i'm getting bad allocation errors.

TegrityFarm commented 3 years ago

Try to increase your memory_buffer. It should not be below 3400 MB.

/Edit: What's the size of your SSD?

chnapo commented 3 years ago

I have the same issue. Sometimes my plots just disappear from the jobs, without any error. It happens to at least 15% of my plots, it really sucks. I have enough RAM (32GB), 2x2TB SSD, not running out of any resource. It started happening on my 2nd rig as well, although less often.

JonathanGorr commented 3 years ago

@TegrityFarm Okay. I swapped for brand new ram and the problem still happens. I will try adding all my ram back to increase the total available. I have 2x 2 tb SSDs.

TegrityFarm commented 3 years ago

I meant to increase your Ram (memory_buffer) in your config.

Ufkabakan commented 3 years ago

I'm fighting with same issue. I have few harvester PC for making plots and only 2 PC have this problem. I tested hardware with AIDA64 stress test for 3 hours, also completed with memtes86 4/4 pass with "0" error. But i lost some plots work on this PC. Cache/temp folder have temp files for this plots also, logs for this plots stopped without error. If i don't deleted this vanished plots temp files, my Cache SSD can be full.

Look at the pic, plot-k32-2021-05-27-08-16... and plot-k32-2021-05-27-17-13... plot job vanished and this plot temp files still here: resim

Log from plot-k32-2021-05-27-08-16... stopped without any info or error: resim

Log from plot-k32-2021-05-27-17-13... stopped with this error: (First time i see this error on my vanished plot works. Others like first example, without error)

Bucket 70 uniform sort. Ram: 3.498GiB, u_sort min: 1.500GiB, qs min: 0.479GiB.
Bucket 71 uniform sort. Ram: 3.498GiB, u_sort min: 1.500GiB, qs min: 0.479GiB.
Bucket 72 uniform sort. Ram: 3.498GiB, u_sort min: 1.500GiB, qs min: 0.479GiB.
Bucket 73 uniform sort. Ram: 3.498GiB, u_sort min: 1.500GiB, qs min: 0.479GiB.
Bucket 74 uniform sort. Ram: 3.498GiB, u_sort min: 1.500GiB, qs min: 0.479GiB.

Caught plotting error: remove: The process cannot access the file because it is being used by another process.: "Z:\CACHE\plot-k32-2021-05-27-17-13-0f40b9f9cf518f4ab2a446bdce7e1035e2fa865bffd62e109a9f95bfb1a153ca.plot.p3.t6.sort_bucket_074.tmp" [5936] Failed to execute script chia Traceback (most recent call last): File "chia\cmds\chia.py", line 81, in File "chia\cmds\chia.py", line 77, in main File "click\core.py", line 829, in call File "click\core.py", line 782, in main File "click\core.py", line 1259, in invoke File "click\core.py", line 1259, in invoke File "click\core.py", line 1066, in invoke File "click\core.py", line 610, in invoke File "click\decorators.py", line 21, in new_func File "chia\cmds\plots.py", line 135, in create_cmd File "chia\plotting\create_plots.py", line 176, in create_plots RuntimeError: remove: The process cannot access the file because it is being used by another process.

chnapo commented 3 years ago

So guys, I solved it for myself. Now, I did not have time to test each possible solution separately so I changed everything at once. My specs: Ryzen 9 5900X, 32GB 3600 MHz RAM, 2x2TB Gen 4 SSD My former, problematic settings: 12 plots limit, 30min time between plots, total max 4 phase 1, max 6 per SSD, Windows priority: real time (this is my guess that was causing the issue, also UNIX priority was problematic on Ubuntu when adjusted to maximum priority), 4000MB RAM per plot, 16 threads per plot My new settings that removed the issue (so far it seems) - I also increased amount of parallel plots but that was to increase the load on my CPU:

  1. Removed any CPU overclock, including PBO. SWAR settings: 14 plots limit, 30min time between plots, total max 5 phase 1, max 7 per SSD, Windows priority: 32, 4000MB RAM per plot, 4 threads per plot. I am getting about 13 concurrent plots, sometimes 14, it is mostly limited by the phase 1 limiter (I am afraid not to run out of SSD bandwidth). But what is most important, not a single plot disappeared. Before this, I also adjusted my video card to 4xGen4 instead of 16xGen4 so that my SSDs get enough bandwidth and it solved some of the issues but not this in particular. My recommendation would be to first lower the Windows or Unix process priority to normal, then remove any CPU overclocks, make sure you have enough RAM and pagefile for the plots and most important, manually remove the files of plots that disappeared, they most likely stay on your SSD! Good luck you all, hope I helped.

BTW if anyone was curious, I am getting about 35-40 plots per day with these new settings (32 for the old ones, minus about 8 disappeared plots per day, which makes it 24).