linux-rdma / opensm

Other
66 stars 36 forks source link

Please provide systemd service files #9

Open bdrung opened 5 years ago

bdrung commented 5 years ago

It would be nice if opensm comes with systemd service files. Otherwise each distribution would have to create its own service files and might diverge.

hnrose commented 5 years ago

I thought that the current OpenSM spec file supports old daemon management framework SysV (RHEL 6.X) .

What distributions are of interest ?

bdrung commented 5 years ago

All recent distributions (Debian, Ubuntu, Fedora, etc) would benefit from a systemd service file. Let's quote lintian for a reasoning:

The specified init.d script has no equivalent systemd service.

Whilst systemd has a SysV init.d script compatibility mode, providing native systemd support has many advantages such as being able to specify security hardening features.

jamespharvey20 commented 5 years ago

I'm the maintainer for Arch Linux's AUR opensm (and other InfiniBand) packages. (To be clear, AUR packages are maintained by any user who adopts the packages - InfiniBand packages are not part of Arch's official repositories.)

When I ran into the problem of no systemd service file, I copied (and gave credit) to the systemd opensm.service file included by Fedora. I would really like to see this or a version of it included here, as well. Fedora also notes that there is a timing bug that intermittently causes a signal 15 failure on start, so their workaround is to use a separate script. I have no idea if this intermittent timing bug still exists.

They use this separate script of theirs to allow multiple versions of opensm to run, on multiple ports.

opensm.service

Unit]
Description=Starts the OpenSM InfiniBand fabric Subnet Manager
Documentation=man:opensm
DefaultDependencies=false
Before=network.target remote-fs-pre.target
Requires=rdma.service
After=rdma.service

[Service]
Type=forking
ExecStart=/usr/libexec/opensm-launch

[Install]
WantedBy=network.target

opensm.launch

#!/bin/bash
#
# Launch the necessary OpenSM daemons for systemd
#
# sysconfig: /etc/sysconfig/opensm
# config: /etc/rdma/opensm.conf
#

shopt -s nullglob

prog=/usr/sbin/opensm
[ -f /etc/sysconfig/opensm ] && . /etc/sysconfig/opensm

[ -n "$PRIORITY" ] && prio="-p $PRIORITY"

if [ -z "$GUIDS" ]; then
   CONFIGS=""
   CONFIG_CNT=0
   for conf in /etc/rdma/opensm.conf.[0-9]*; do
      CONFIGS="$CONFIGS $conf"
      let CONFIG_CNT++
   done
else
   GUID_CNT=0
   for guid in $GUIDS; do
      let GUID_CNT++
   done
fi
# Start opensm
if [ -n "$GUIDS" ]; then
   SUBNET_COUNT=0
   for guid in $GUIDS; do
      SUBNET_PREFIX=`printf "0xfe800000000000%02d" $SUBNET_COUNT`
      (while true; do $prog $prio -g $guid --subnet_prefix $SUBNET_PREFIX; sleep 30; done) &                                                                      
      let SUBNET_COUNT++
   done
elif [ -n "$CONFIGS" ]; then
   for config in $CONFIGS; do
      (while true; do $prog $prio -F $config; sleep 30; done) &
   done
else
   (while true; do $prog $prio; sleep 30; done) &
fi
exit 0

I just tried running multiple interfaces for the first time myself, and ran across that their method of giving opensm a unique --subnet_prefix is broken, because this option is no longer a valid option for opensm. Running two instances of opensm -g <different GUIDS> appears to work, but I'm assuming at one point in the past, opensm might have complained if there were multiple versions running on the same subnet prefix.

If you do not want multiple interface support, opensm.launch can be simplified to:

#!/bin/bash

(while true; do /usr/bin/opensm; sleep 30; done) &
exit 0

opensm.sysconfig

# Problem #1: Multiple IB fabrics needing a subnet manager
#
# In the event that a machine has more than one IB subnet attached,
# and that machine is an opensm server, by default, opensm will
# only attach to one port and will not manage the fabric on the
# other port.  There are two ways to solve this problem:
#
# 1) Start opensm on multiple machines and configure it to manage
#    different fabrics on each machine
# 2) Configure opensm to start multiple instances on a single
#    machine
#
# Both solutions to this problem require non-standard configurations.
# In other words, you would normally have to modify /etc/rdma/opensm.conf
# and once you do that, the file will no longer be updated for new
# options when opensm is upgraded.  In an effort to allow people to
# have more than one subnet managed by opensm without having to modify
# the system default opensm.conf file, we have enabled two methods
# for modifying the default opensm config items needed to enable
# multiple fabric management.
#
# Method #1: Create multiple opensm.conf files in non-standard locations
#   Copy /etc/rdma/opensm.conf to /etc/rdma/opensm.conf.<number>
#     (do this once for each instance you want started)
#   Edit each copy of the opensm.conf file to reflect the necessary changes
#     for a multiple instance startup.  If you need to manage more than
#     one fabric, you will have to change the guid option in each file
#     to specify the guid of the specific port you want opensm attached
#     to.
#
# The advantage to method #1 is that, on the off chance you want to do
# really special custom things on different ports, like have different
# QoS settings depending on which port you are attached to, you have the
# freedom to edit any and all settings for each instance without those
# changes affecting other instances or being lost when opensm upgrades.
#
# Method #2: Specify multiple GUIDS variable entries in this file
#   Uncomment the below GUIDS variable and enter each guid you need to attach
#     to into the list.  If using this method you need to enter each
#     guid into the list as we won't attach to any default ports, only
#     those specified in the list.
#
#GUIDS="0x0002c90300048ca1 0x0002c90300048ca2"
#
# The obvious advantage to method #2 is that it's simple and doesn't
# clutter up your file system, but it is far more limited in what you
# can do.  If you enable method #2, then even if you create the files
# referenced in method #1, they will be ignored.
#
# Problem #2: Activating a backup subnet manager
#
# The default priority of opensm is set so that it wants to be the
# primary subnet manager.  This is great when you are only running
# opensm on one server, but if you want to have a non-primary opensm
# instance for failover, then you have to manually edit the opensm.conf
# file like for problem #1.  This carries with it all the problems
# listed above.  If you wish to enable opensm as a non-primary manager,
# then you can uncomment the PRIORITY variable below and set it to
# some number between 0 and 15, where 15 is the highest priority and
# the primary manager, with 0 being the lowest backup server.  This method
# will work with the GUIDS option above, and also with the multiple
# config files in method #1 above.  However, only a single priority is
# supported here.  If you wanted more than one priority (say this machine
# is the primary on the first fabric, and second on the second fabric,
# while the other opensm server is primary on the second fabric and
# second on the primary), then the only way to do that is to use method #1
# above and individually edit the config files.  If you edit the config
# files to set the priority and then also set the priority here, then
# this setting will override the config files and render that particular
# edit useless.
#
#PRIORITY=15
ghost commented 5 years ago

When I ran into the problem of no systemd service file, I copied (and gave credit) to the systemd opensm.service file included by Fedora. I would really like to see this or a version of it included here, as well. Fedora also notes that there is a timing bug that intermittently causes a signal 15 failure on start, so their workaround is to use a separate script. I have no idea if this intermittent timing bug still exists.

There is no signal 15 failure for Fedora. Please see explanation in this bug page. https://bugzilla.redhat.com/show_bug.cgi?id=1663785

jamespharvey20 commented 5 years ago

When I ran into the problem of no systemd service file, I copied (and gave credit) to the systemd opensm.service file included by Fedora. I would really like to see this or a version of it included here, as well. Fedora also notes that there is a timing bug that intermittently causes a signal 15 failure on start, so their workaround is to use a separate script. I have no idea if this intermittent timing bug still exists.

There is no signal 15 failure for Fedora. Please see explanation in this bug page. https://bugzilla.redhat.com/show_bug.cgi?id=1663785

Yeah, I was given bad info about that. At the link from HonggangLI, there's discussion of how it's done so opensm stays running, as it (at least in the past) closes in certain situations like a cable being unplugged. (The link is well worth a read.) If that's still opensm's native behavior, I think it would be nice if it was changed. I don't think anyone would want it to close in situations like that. It's of course different, but that would be like having dhcpd close whenever a client unplugged.