preda / gpuowl

GPU Mersenne primality test.
GNU General Public License v3.0
178 stars 40 forks source link

Test example of gpuOwl systemd service file #173

Closed valeriob01 closed 9 months ago

valeriob01 commented 4 years ago

This file is intended for automation of gpuOwl gpucomputing instances. I am testing it on my system but the tmux script refuses to start with an error like this: "not a terminal". Can you test it on your system and let me know if it works?

[Unit]
Description=GpuOwl
After=network-online.target
Wants=network-online.target

[Service]
Type=simple
PIDFile=/run/gpuowl.pid
ExecStart=/home/sel/exec_and_ctrl.sh
KillSignal=SIGINT
Restart=on-failure
RestartSec=2
TimeoutStartSec=15
ExecStop=pkill --signal SIGINT gpuowl
ExecStopPost=/opt/rocm-3.3.0/bin/rocm-smi --setprofile 2

[Install]
WantedBy=multi-user.target
Alias=gpuowl.service

This is an example, actual file should include your gpuowl startup script at ExecStart= instead of the example line. This lets you use "systemctl stop gpuowl" to stop gpuowl instances gracefully.

valeriob01 commented 4 years ago

BTW, this file should be named gpuowl.service and placed into the /etc/systemd/system directory.

valeriob01 commented 4 years ago

https://www.selroc.systems/forum/sel-mersenne-prime-research-effort-and-other-plans/gpuowl-specific-material/159-test-example-of-gpuowl-systemd-service-file

valeriob01 commented 4 years ago

RESOLVED, this is the new corrected file, and I added full paths into my script, now it works at boot.

[Unit]
Description=GpuOwl
After=network-online.target
Wants=network-online.target

[Service]
Type=oneshot
PIDFile=/run/gpuowl.pid
ExecStart=/home/user/exec_and_ctrl.sh
RemainAfterExit=true
KillSignal=SIGINT
ExecStop=pkill --signal SIGINT gpuowl
ExecStopPost=/opt/rocm-3.3.0/bin/rocm-smi --setprofile 2

[Install]
WantedBy=default.target
Alias=gpuowl.service

after opening a console I type "tmux attach -t \<session-name>" and I find the tmux session running !

For stating/stopping the service I can now use:

start: systemctl start gpuowl.service stop: systemctl stop gpuowl.service see the status: systemctl status gpuowl.service

selroc commented 4 years ago

N.B.: to make gpuowl.service start at boot it is necessary to issue this command systemctl enable gpuowl.service obviously to disable the service use the inverse command systemctl disable gpuowl.service

This last version of service file works with Ubuntu Focal Fossa and should work also with Debian (I think on Debian PIDFile=/run/gpuowl.pid must be substituted with PIDFile=/var/run/gpuowl.pid).

In the next comment I will publish my startup script so that the topic is complete.

valeriob01 commented 4 years ago

this is the script exec_and_ctrl.sh adapted for publication:

#!/usr/bin/env bash

set -o xtrace

export LD_LIBRARY_PATH=/opt/rocm-3.3.0/opencl/lib/x86_64:/opt/hsa/lib

export HSA_ENABLE_SDMA=0

sleep 1
/opt/rocm-3.3.0/bin/rocm-smi --resetprofile
/opt/rocm-3.3.0/bin/rocm-smi --setsclk 8
/opt/rocm-3.3.0/bin/rocm-smi --setmclk 2
/opt/rocm-3.3.0/bin/rocm-smi --autorespond y --setoverdrive 2
/opt/rocm-3.3.0/bin/rocm-smi --autorespond y --setmemoverdrive 10

sleep 1

# Start tmux session with watch
/usr/bin/tmux set-option set-remain-on-exit on
/usr/bin/tmux new-session -d -c /home/user -n session1 -s S1 'watch -n 1 sensors -A'
# Start tmux window with primenet.py
/usr/bin/tmux split-window -d -c /home/user -t S1 '/home/user/gpuowl/tools/primenet.py -u xxxx -p yyyy --dirs /home/user/work0 /home/user/work1 -t 1800 -w PRP --tasks 3'

# Start tmux windows for gpuowl instances
/usr/bin/tmux split-window -d -c /home/user/gpuowl -t S1 '/home/user/gpuowl/gpuowl -dir /home/user/work0 -user xxxx -cpu R7a -block 1000 -log 20000 -device 0 -proof 8 -tmpDir /mnt/tmp'
/usr/bin/tmux split-window -d -c /home/user/gpuowl -t S1 '/home/user/gpuowl/gpuowl -dir /home/user/work1 -user xxxx -cpu R7b -block 1000 -log 20000 -device 1 -proof 8 -tmpDir /mnt/tmp'
/usr/bin/tmux select-layout -t S1 tiled
valeriob01 commented 4 years ago

BTW, if you are mounting NFS shares that contain your scripts at boot, you may want to add RequiresMountsFor= under the [Unit] section, to delay the startup until the NFS shares are mounted otherwise the service will fail to start !

This is what I do, I put the gpuowl directory tree on a NFS share so that I can maintain a single copy of gpuowl.

selroc commented 4 years ago

With ROCm 3.7 expected for 2nd August week, the ROCm Data Center Tool (RDC) will be available to monitor the GPUs. This will change things a bit and scripts will need modifications. Stay tuned.

valeriob01 commented 4 years ago

I have been using this service file for a while now, it works well on Ubuntu, it fails on Debian. The issue is with the tmux startup script, tmux complains about "no terminal". I have no idea how to solve this issue, and since I am using Ubuntu it is not high priority to fix.

preda commented 9 months ago

Thanks for investigating the systemd service runner. I'm closing the issue, but feel free to link to it in any way for others that may want to run it that way.