Closed valeriob01 closed 8 months ago
BTW, this file should be named gpuowl.service and placed into the /etc/systemd/system directory.
RESOLVED, this is the new corrected file, and I added full paths into my script, now it works at boot.
[Unit]
Description=GpuOwl
After=network-online.target
Wants=network-online.target
[Service]
Type=oneshot
PIDFile=/run/gpuowl.pid
ExecStart=/home/user/exec_and_ctrl.sh
RemainAfterExit=true
KillSignal=SIGINT
ExecStop=pkill --signal SIGINT gpuowl
ExecStopPost=/opt/rocm-3.3.0/bin/rocm-smi --setprofile 2
[Install]
WantedBy=default.target
Alias=gpuowl.service
after opening a console I type "tmux attach -t \<session-name>" and I find the tmux session running !
For stating/stopping the service I can now use:
start:
systemctl start gpuowl.service
stop:
systemctl stop gpuowl.service
see the status:
systemctl status gpuowl.service
N.B.: to make gpuowl.service start at boot it is necessary to issue this command
systemctl enable gpuowl.service
obviously to disable the service use the inverse command
systemctl disable gpuowl.service
This last version of service file works with Ubuntu Focal Fossa and should work also with Debian (I think on Debian PIDFile=/run/gpuowl.pid
must be substituted with PIDFile=/var/run/gpuowl.pid
).
In the next comment I will publish my startup script so that the topic is complete.
this is the script exec_and_ctrl.sh
adapted for publication:
#!/usr/bin/env bash
set -o xtrace
export LD_LIBRARY_PATH=/opt/rocm-3.3.0/opencl/lib/x86_64:/opt/hsa/lib
export HSA_ENABLE_SDMA=0
sleep 1
/opt/rocm-3.3.0/bin/rocm-smi --resetprofile
/opt/rocm-3.3.0/bin/rocm-smi --setsclk 8
/opt/rocm-3.3.0/bin/rocm-smi --setmclk 2
/opt/rocm-3.3.0/bin/rocm-smi --autorespond y --setoverdrive 2
/opt/rocm-3.3.0/bin/rocm-smi --autorespond y --setmemoverdrive 10
sleep 1
# Start tmux session with watch
/usr/bin/tmux set-option set-remain-on-exit on
/usr/bin/tmux new-session -d -c /home/user -n session1 -s S1 'watch -n 1 sensors -A'
# Start tmux window with primenet.py
/usr/bin/tmux split-window -d -c /home/user -t S1 '/home/user/gpuowl/tools/primenet.py -u xxxx -p yyyy --dirs /home/user/work0 /home/user/work1 -t 1800 -w PRP --tasks 3'
# Start tmux windows for gpuowl instances
/usr/bin/tmux split-window -d -c /home/user/gpuowl -t S1 '/home/user/gpuowl/gpuowl -dir /home/user/work0 -user xxxx -cpu R7a -block 1000 -log 20000 -device 0 -proof 8 -tmpDir /mnt/tmp'
/usr/bin/tmux split-window -d -c /home/user/gpuowl -t S1 '/home/user/gpuowl/gpuowl -dir /home/user/work1 -user xxxx -cpu R7b -block 1000 -log 20000 -device 1 -proof 8 -tmpDir /mnt/tmp'
/usr/bin/tmux select-layout -t S1 tiled
BTW, if you are mounting NFS shares that contain your scripts at boot, you may want to add RequiresMountsFor=
This is what I do, I put the gpuowl directory tree on a NFS share so that I can maintain a single copy of gpuowl.
With ROCm 3.7 expected for 2nd August week, the ROCm Data Center Tool (RDC) will be available to monitor the GPUs. This will change things a bit and scripts will need modifications. Stay tuned.
I have been using this service file for a while now, it works well on Ubuntu, it fails on Debian. The issue is with the tmux startup script, tmux complains about "no terminal". I have no idea how to solve this issue, and since I am using Ubuntu it is not high priority to fix.
Thanks for investigating the systemd service runner. I'm closing the issue, but feel free to link to it in any way for others that may want to run it that way.
This file is intended for automation of gpuOwl gpucomputing instances. I am testing it on my system but the tmux script refuses to start with an error like this: "not a terminal". Can you test it on your system and let me know if it works?
This is an example, actual file should include your gpuowl startup script at ExecStart= instead of the example line. This lets you use "systemctl stop gpuowl" to stop gpuowl instances gracefully.