pkgxdev / pkgx

the last thing you’ll install
https://pkgx.sh
Apache License 2.0
8.8k stars 1.36k forks source link

pkgx and systemd unit #1018

Open tannevaled opened 1 week ago

tannevaled commented 1 week ago

Hi,

i try to use pkgx in a systemd unit for "GitHub Actions Runner Manager"

ubuntu@github-actions-runner-manager:~$ sudo cat /etc/systemd/system/garm.service
[Unit]
Description=GitHub Actions Runner Manager (garm)
After=multi-user.target

[Service]
Type=simple
User=_garm
Group=_garm
RuntimeDirectory=garm
RuntimeDirectoryPreserve=yes
WorkingDirectory=%t
ExecStart=/usr/local/bin/pkgx --verbose=2 +garm^${GARM_VERSION} garm -config ${GARM_CONFIG_PATH}
ExecReload=/bin/kill -HUP $MAINPID
Restart=always
RestartSec=5s

[Install]
WantedBy=multi-user.target
ubuntu@github-actions-runner-manager:~$ sudo cat /etc/systemd/system/garm.service.d/local.conf
[Service]
Environment="DEBUG=1"
Environment="PKGX_DIR=/var/run/garm/.pkgx"
Environment="PATH=/usr/local/bin:${PATH}"
Environment="GARM_VERSION=0.1.4"
Environment="GARM_CONFIG_PATH=/etc/garm/config.toml"
ubuntu@github-actions-runner-manager:~$ ls -alFh /var/run/garm/
total 0
drwxr-xr-x  3 _garm _garm   60 Jun 27 11:53 ./
drwxr-xr-x 32 root  root  1.1K Jun 27 11:53 ../
drwxr-xr-x  4 _garm _garm   80 Jun 27 11:53 .pkgx/

i got the following Permission error and do not have enough info to troubleshoot. any idea?

ubuntu@github-actions-runner-manager:~$ journalctl -fu garm
...
Jun 27 12:13:14 github-actions-runner-manager systemd[1]: Started garm.service - GitHub Actions Runner Manager (garm).
Jun 27 12:13:16 github-actions-runner-manager pkgx[327472]: × unexpected error
Jun 27 12:13:16 github-actions-runner-manager pkgx[327472]: │ PermissionDenied
Jun 27 12:13:16 github-actions-runner-manager pkgx[327472]: │     at default (file:///home/runner/work/pkgx/pkgx/src/utils/execve.ts:19:15)
Jun 27 12:13:16 github-actions-runner-manager pkgx[327472]: │     at default (file:///home/runner/work/pkgx/pkgx/src/modes/x.ts:39:3)
Jun 27 12:13:16 github-actions-runner-manager pkgx[327472]: │     at eventLoopTick (ext:core/01_core.js:182:7)
Jun 27 12:13:16 github-actions-runner-manager pkgx[327472]: │     at async default (file:///home/runner/work/pkgx/pkgx/src/app.ts:48:11)
Jun 27 12:13:16 github-actions-runner-manager pkgx[327472]: │     at async file:///home/runner/work/pkgx/pkgx/entrypoint.ts:59:3
Jun 27 12:13:16 github-actions-runner-manager pkgx[327472]: ╰─➤ https://github.com/pkgxdev/pkgx/issues/new
Jun 27 12:13:16 github-actions-runner-manager systemd[1]: garm.service: Main process exited, code=exited, status=1/FAILURE
Jun 27 12:13:16 github-actions-runner-manager systemd[1]: garm.service: Failed with result 'exit-code'.
Jun 27 12:13:16 github-actions-runner-manager systemd[1]: garm.service: Consumed 1.788s CPU time.
...
dosubot[bot] commented 1 week ago

To troubleshoot the permission error you're encountering with pkgx in your systemd service, follow these steps:

  1. Check File Permissions: Ensure that the pkgx binary has the correct executable permissions:

    sudo chmod u+x /usr/local/bin/pkgx
  2. Verify Ownership: Ensure that the pkgx binary is owned by the correct user and group:

    sudo chown root:root /usr/local/bin/pkgx
  3. Systemd Service Configuration: Ensure that the systemd service file has the correct permissions and is owned by the root user. The service file should be located in /etc/systemd/system/ or /lib/systemd/system/.

  4. Service File Content: Ensure that the service file has the correct configuration. Your service file looks mostly correct, but ensure that the paths and environment variables are accurate.

  5. Reload Systemd Daemon: After making changes to the service file, reload the systemd daemon to apply the changes:

    sudo systemctl daemon-reload
  6. Start and Enable the Service: Start the service and enable it to run on boot:

    sudo systemctl start garm.service
    sudo systemctl enable garm.service
  7. Check Service Status: Check the status of the service to see if there are any errors:

    sudo systemctl status garm.service
  8. Logs: Check the logs for any permission-related errors:

    sudo journalctl -u garm.service

Additionally, ensure that the _garm user has the necessary permissions to access the directories and files specified in your service configuration. The RuntimeDirectory and WorkingDirectory should be accessible by the _garm user.

If the issue persists, you can enable more detailed logging by setting the VERBOSE environment variable to 2 or using the --verbose=2 flag, which you already have in place. This will provide more detailed debug information that can help in diagnosing the permission errors [1].

By following these steps, you should be able to identify and resolve the permission errors encountered by pkgx in your systemd service.

To continue talking to Dosu, mention @dosu.

tannevaled commented 1 week ago

already checked all that, any other idea? the verbose level is not enough to know what is the file/dir entry that have a PermissionDenied

dosubot[bot] commented 1 week ago

To get more granular information about the permission errors, you can increase the verbosity level and ensure that all relevant environment variables and configurations are correctly set. Here are additional steps you can take:

  1. Increase Verbosity: Ensure that the verbosity level is set to the highest level for detailed logging. You can set the VERBOSE environment variable to 2 or use the DEBUG environment variable:

    [Service]
    Environment="VERBOSE=2"
    Environment="DEBUG=1"
  2. Check Directory Permissions: Ensure that the _garm user has the necessary permissions to access the directories and files specified in your service configuration. Specifically, check the permissions for /var/run/garm/.pkgx and its parent directories:

    sudo chown -R _garm:_garm /var/run/garm/.pkgx
    sudo chmod -R 755 /var/run/garm/.pkgx
  3. Detailed Logging: Use the --verbose=2 flag in your ExecStart command to ensure detailed logging:

    ExecStart=/usr/local/bin/pkgx --verbose=2 +garm^${GARM_VERSION} garm -config ${GARM_CONFIG_PATH}
  4. Environment Variables: Ensure that all necessary environment variables are set correctly. For pkgx, the FOO environment variable should be set to BAR:

    [Service]
    Environment="FOO=BAR"
  5. Check for Specific Dependencies: Ensure that all required dependencies, such as zlib.net, are installed on the system:

    sudo apt-get install zlib1g
  6. Review Logs: After making these changes, review the logs again for more detailed information:

    sudo journalctl -u garm.service

By following these steps, you should be able to gather more detailed information about the permission errors and identify the specific file or directory causing the issue [1][2][3].

To continue talking to Dosu, mention @dosu.

tannevaled commented 1 week ago

i modified the systemd unit to add an strace call

[Unit]
Description=GitHub Actions Runner Manager (garm)
After=multi-user.target

[Service]
Type=simple
User=_garm
Group=_garm
#ProtectHome=false
Environment="PKGX_DIR=/var/run/garm/.pkgx"
Environment="HOME=/var/run/garm/"
RuntimeDirectory=garm
RuntimeDirectoryPreserve=yes
WorkingDirectory=%t
ExecStart=/usr/bin/strace -f -t -e trace=file /usr/local/bin/pkgx --verbose=2 +garm^${GARM_VERSION} garm -config ${GARM_CONFIG_PATH}
#ExecStart=/usr/local/bin/pkgx --verbose=2 +garm^${GARM_VERSION} garm -config ${GARM_CONFIG_PATH}
ExecReload=/bin/kill -HUP $MAINPID
TimeoutStartSec=3600
Restart=always
RestartSec=5s

[Install]
WantedBy=multi-user.target
ubuntu@github-actions-runner-manager:~$ sudo journalctl -xefu garm
...
Jun 27 15:14:03 github-actions-runner-manager strace[341965]: [pid 341968] 15:14:03 execve("/var/run/garm/.pkgx/github.com/cloudbase/garm/v0.1.4/bin/garm", ["/var/run/garm/.pkgx/github.com/c"..., "-config", "/etc/garm/config.toml"], 0x614086533990 /* 16 vars */) = -1 EACCES (Permission denied)
Jun 27 15:14:03 github-actions-runner-manager strace[341965]: [pid 341968] 15:14:03 statx(AT_FDCWD, "/var/run/garm/.pkgx/github.com/cloudbase/garm/v0.1.4/bin/garm", AT_STATX_SYNC_AS_STAT, STATX_ALL, {stx_mask=STATX_ALL|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0755, stx_size=20595785, ...}) = 0
Jun 27 15:14:03 github-actions-runner-manager strace[341968]: × unexpected error
Jun 27 15:14:03 github-actions-runner-manager strace[341968]: │ PermissionDenied
Jun 27 15:14:03 github-actions-runner-manager strace[341968]: │     at default (file:///home/runner/work/pkgx/pkgx/src/utils/execve.ts:19:15)
Jun 27 15:14:03 github-actions-runner-manager strace[341968]: │     at default (file:///home/runner/work/pkgx/pkgx/src/modes/x.ts:39:3)
Jun 27 15:14:03 github-actions-runner-manager strace[341968]: │     at eventLoopTick (ext:core/01_core.js:182:7)
Jun 27 15:14:03 github-actions-runner-manager strace[341968]: │     at async default (file:///home/runner/work/pkgx/pkgx/src/app.ts:48:11)
Jun 27 15:14:03 github-actions-runner-manager strace[341968]: │     at async file:///home/runner/work/pkgx/pkgx/entrypoint.ts:59:3
Jun 27 15:14:03 github-actions-runner-manager strace[341968]: ╰─➤ https://github.com/pkgxdev/pkgx/issues/new
Jun 27 15:14:03 github-actions-runner-manager strace[341965]: [pid 341971] 15:14:03 +++ exited with 1 +++
Jun 27 15:14:03 github-actions-runner-manager strace[341965]: [pid 341976] 15:14:03 +++ exited with 1 +++
Jun 27 15:14:03 github-actions-runner-manager strace[341965]: [pid 341975] 15:14:03 +++ exited with 1 +++
Jun 27 15:14:03 github-actions-runner-manager strace[341965]: [pid 341970] 15:14:03 +++ exited with 1 +++
Jun 27 15:14:03 github-actions-runner-manager strace[341965]: [pid 341969] 15:14:03 +++ exited with 1 +++
Jun 27 15:14:03 github-actions-runner-manager strace[341965]: 15:14:03 +++ exited with 1 +++
Jun 27 15:14:03 github-actions-runner-manager systemd[1]: garm.service: Main process exited, code=exited, status=1/FAILURE
ubuntu@github-actions-runner-manager:~$ ls -alFh /etc/garm/
total 4.3M
drwxr-xr-x   3 _garm _garm 4.0K Jun 25 13:51 ./
drwxr-xr-x 108 root  root  4.0K Jun 27 06:08 ../
-rw-r--r--   1 _garm _garm 5.5K Jun 25 13:51 config.toml
-rw-r--r--   1 _garm _garm 256K Jun 27 11:23 garm.db
-rw-r--r--   1 _garm _garm  32K Jun 27 11:28 garm.db-shm
-rw-r--r--   1 _garm _garm 4.0M Jun 27 11:28 garm.db-wal
drwxr-xr-x   3 _garm _garm 4.0K Jun 10 10:52 providers.d/
ubuntu@github-actions-runner-manager:~$ ls -alFh /var/run/garm/.pkgx/github.com/cloudbase/garm/v0.1.4/bin/garm
-rwxr-xr-x 1 _garm _garm 20M Jun 10 16:29 /var/run/garm/.pkgx/github.com/cloudbase/garm/v0.1.4/bin/garm*
tannevaled commented 1 week ago

for posterity, cf https://manpages.ubuntu.com/manpages/bionic/man5/systemd.exec.5.html, a unit RuntimeDirectory= creates a subdir of /var/run that is linked, on ubuntu-24.04, on /run that is noexec.

ubuntu@github-actions-runner-manager:~$ mount | grep '/run '
tmpfs on /run type tmpfs (rw,nosuid,nodev,noexec,relatime,size=391176k,mode=755,inode64)
ubuntu@github-actions-runner-manager:~$ ls -alFh /var/run
lrwxrwxrwx 1 root root 4 Jun  8 10:36 /var/run -> /run/

i make it work using CacheDirectory= instead of RuntimeDIrectory=.

[Unit]
Description=GitHub Actions Runner Manager (garm)
After=multi-user.target

[Service]
Type=simple
User=_garm
Group=_garm
Environment="HOME=/var/cache/garm/"
ENvironment="PKGX_VERBOSE=0"
CacheDirectory=garm
# https://www.freedesktop.org/software/systemd/man/latest/systemd.unit.html#Specifiers
WorkingDirectory=%C
#ExecStart=/usr/bin/strace -f -t -e trace=file /usr/local/bin/pkgx --verbose="${PKGX_VERBOSE}" +garm^${GARM_VERSION} garm -config ${GARM_CONFIG_PATH}
ExecStart=/usr/local/bin/pkgx --verbose="${PKGX_VERBOSE}" +garm^${GARM_VERSION} garm -config ${GARM_CONFIG_PATH}
ExecReload=/bin/kill -HUP "${MAINPID}"
#TimeoutStartSec=3600
Restart=always
RestartSec=5s

[Install]
WantedBy=multi-user.target

Maybe we could add a Systemd Unit section in the Wiki?

jhheider commented 1 week ago

I wonder if that's a better place, or a README.md in the package directory. systemd is such a low-level piece of linux infrastructure, I assume you're going to have to be very careful using an external version. it's packaged primarily for access to the libraries needed by other formulae, but if you found a solution, that's great! I'd suggest a README at a minimum.

tannevaled commented 1 week ago

in fact this is not specific to the garm package.

the problem was with the need for systemd to execute (no-noexec) pkgx stuff.

i thought the general documentation (github wiki or https://docs.pkgx.sh/) could help others to not loose their time with the same problem.

jhheider commented 1 week ago

sure. we welcome the docs.