Closed ekersey closed 2 years ago
Hi, thanks for the detailed report. Please provide plotman status
output from the 2ndary plotter worker at stage shown above. On the plotting system:
docker exec -it machinaris bash
plotman status
Please paste a screenshot of this status output from the plotman CLI.
Please also provide a full screenshot of your Workers page. Thanks.
plotman status
command appears to hang and does not return anything at all:
Workers page:
plotman status
finally returned when the plot finished copying to destination. Executing again right after shows new plot started.
Hi, thanks for the detailed response. Very interesting that the separate plotman status
process on your Ripper machine actually hung for the duration of the copy. As that is a separate plotman invocation, which simply spins through the container's process list looking for other running plotman
and chia_plot
processes, I am thinking there is some resource contention that is slowing/pausing the entire Docker container, not just the single Plotman job.
Secondly, you mentioned that the same plotting configuration on your Fullnode system does not exhibit this issue. Since the code is the same, that indicates something is different between the two systems at a hardware/volume level.
dst
volume path on each system?dst
path on each system an Unassigned Device in Unraid? dst
path on Ripper plotter is a remote network share over the ChiaBoxZero fullnode?I would recommend experimenting with different dst
locations on the plotter. The original Plotman author's design, particularly for remote plotting, was for:
tmp
: to be SSD or RAM disk.dst
: to be a locally attached staging drivedst
over to a final location, either a remote server or a slow local drive.Hope this helps, Guy
Just found this, probably related: https://github.com/ericaltendorf/plotman/issues/714
Should mention that host OS for secondary plotter is Ubuntu 20.04, and destination drive is mounted NFS share from primary Unraid server.
Going to switch to local dst when current job is finished. Looks like maybe what I need to do is figure out how to get rsync setup for the archiver.
Edit: You posted as I was writing this. I'll report back this afternoon when first job with local dst finishes, but I suspect we'll be able to close this as not a Machinaris issue.
Setting dst to a local path appears to have solved the problem. Sorry to have wasted your time.
It's a shame though, seems like a pointless hop to get to the final destination. And with the only drive I have available to use as a local dst currently, it's actually slower than copying over the network. So what was a 1 hour copy delay from the time a plot finished to when it was harvestable, is now 2.5 to 3 hours.
Maybe I'll try leaving the finished plot on the tmp folder and writing my own script to move it to the Unraid share.
Anyhow, not a Machinaris issue. Thank you.
Hi, no worries. One thing to try would leaving the dst
list empty in the plotman settings. My understanding is that Madmax would then consider the plot complete and leave it in the tmp
location (probably an SSD). I believe you could then use Plotman archiving to transfer it from the tmp
location to the remote system via rsync. I have not tested this scenario myself however, so please take this with a grain of salt.
Cheers!
Describe the bug Plotman, on a plotter only worker, is not starting a new Madmax instance as expected when the 1st has started to copy to destination folder. Watching the plotting status on the /plotting/jobs page shows that when it gets to phase 4:1 it just stays there even after the copy to dst has started and it should be in phase 5:1. Incidentally, the "wall" time also stops updating. Also the "Plotting Speed" charts do not show the stats from the 2nd worker at all.
To Reproduce I have a "fullnode" Machinaris container on the primary server, and a "plotter" worker on a 2nd machine. The plotman config files are mostly default aside from dst paths and number of threads for madmax. (the scheduling bits are the same).
Plotting jobs on the "fullnode" work as expected. When reaching stage 5:1 (copying to destination) a new plotter gets spun up while the copying proceeds. Plotting jobs on the "plotter" worker never update the status when reaching stage 5:1, and therefore never spin up a new plotting job until the copy to destination has actually finished.
Expected behavior Expected that plotter on 2nd machine starts a new plot job when the 1st one starts copying final plot to destination (stage 5:1).
System setup:
Config
Plotting scheduling parameters (same on both workers)
scheduling:
Run a job on a particular temp dir only if the number of existing jobs
Additional context & screenshots