Need to reformat information from the presentation:
IocManager: History
and Internals
December 15, 2020
Overview
The purpose of this talk is to describe the design and implementation of the IocManager.
How were IOCs started before IocManager?
What were the problems with this approach?
How does IocManager start IOCs, and how does this address the problems we had originally?
In the Beginning, There Was “init”...
The first user task started in Linux is called “init”.
In RHEL5 systems, this runs a set of startup scripts that live in the /etc/init.d directory.
In RHEL7 systems, this is a symbolic link to systemd, which starts a set of system services.
In either case, our “ioc” script/service does nothing more than run (as root):
/reg/d/iocCommon/hosts/$HOSTNAME/startup.cmd
The Host startup.cmd, v1.0
Originally, the startup.cmd did two things:
Prepare the system for running IOCs by loading any necessary drivers, adjusting kernel parameters, etc.
Run /reg/d/iocCommon/sioc/$IOCNAME/startup.cmd for each IOC that should be started on this host.
The IOC startup.cmd
#!/bin/bash
export IOC="ioc-xrt-xcsimb1"
source /reg/d/iocCommon/All/fee_env.sh
$RUNUSER "mkdir -p $IOC_DATA/$IOC/autosave"
$RUNUSER "mkdir -p $IOC_DATA/$IOC/archive"
$RUNUSER "mkdir -p $IOC_DATA/$IOC/iocInfo"
$RUNUSER "chmod ug+w -R $IOC_DATA/$IOC"
cd /reg/g/pcds/package/epics/3.14/ioc/xrt/ipimb/R4.0.5/build/iocBoot/$IOC
$RUNUSER "cp -f -p ../../archive/$IOC.archive $IOC_DATA/$IOC/archive"
$RUNUSER "$PROCSERV --logfile $IOC_DATA/$IOC/iocInfo/ioc.log \
--name $IOC 30002 ./st.cmd"
What Problems Does This Have?
This is a lot of boilerplate script that needs to be written for each IOC.
The port number is hardcoded in the sioc file, so from the host startup.cmd, it is not immediately clear which ports are in use and which are free to use in a new IOC.
The software version of the IOC is hardcoded in the sioc file, which is run as root at startup. So changing an IOC version requires not only editing the file, but manually killing the old IOC and using sudo to start the new one.
Goals for the IocManager
Minimize the amount of boilerplate scripting necessary for a new IOC.
Keep information about IOCs in a hutch in one place.
Simplify changing the process of updating an IOC, so it is a less privileged operation.
iocmanager.cfg
The configuration file for each hutch.
Contains:
A few global settings (hosts, etc.).
A list of python dictionaries, one for each IOC, giving all of the information about what should be run and where it should be run.
The Host startup.cmd, v2.0
Prepare the system for running IOCs by loading any special drivers, etc.
Run /reg/g/pcds/pyps/apps/ioc/latest/initIOC to load common drivers, etc. and start the all of the IOCs that belong on that host.
initIOC
Determine which hutch we are running in.
Consult hosts.special, a map of hosts to hutches.
Otherwise assume we have hostnames of the form xxx-yyy-zzzzz, where yyy is the hutch.
Set up a basic python environment.
Run the hutch-specific version of the initialization in /reg/g/pcds/pyps/config/$HUTCH/iocmanager/initIOC.hutch.
(This allows different hutches to use different versions of the IocManager initialization script. initIOC is (mostly) unchanging, but changes can be put into initIOC.hutch.)
initIOC.hutch
Source the hutch-specific environment from /reg/d/iocCommon/All/$(HUTCH)_env.sh.
Startup procmgrd.
Conceptually, this can be thought of as a daemon which accepts remote requests to run procmgr with a given script and port as the appropriate ioc user.
In reality, it’s procServ running /bin/sh as the ioc user.
Install EDT framegrabber and EVR drivers.
Start the caRepeater processes.
Run the IocManager “startAll” script to start all of the IOCs on this host.
startAll
Read the iocmanager.cfg file.
Loop over every entry:
If the hostname is the current host, send a request to procmgrd to run the IocManager “startProc” script in a procServ, passing this script the IOC name.
startProc
Setup the standard hutch environment and create the /reg/d/iocData directory structure.
Signal the procServ process to start a new log.
Run the IocManager “getDirectory” script to read iocmanager.cfg and find the directory entry for the current IOC.
(Reading the directory from the configuration file at this point allows simple upgrading by simply restarting the IOC after changing the configuration file.)
Entering the IOC Directory in startProc
cd /reg/g/pcds/epics
if test -d $dir; then cd $dir; fi
if test -f env.sh; then source ./env.sh; fi
if test -d children/build/iocBoot/$ioc; the
cd children/build/iocBoot/$ioc;
fi
if test -d build/iocBoot/$ioc; then cd build/iocBoot/$ioc; fi
if test -d iocBoot/$ioc; then cd iocBoot/$ioc; fi
if test -f env.sh; then source ./env.sh; fi
IocManager tries to be smart about the directory. It might be a relative or absolute path. It might be a templated parent with children, a templated child, or completely untemplated.
env.sh can be in the top-level or IOC directory to further customize the environment.
startProc, Continued
Create a small status file with hostname and port info /reg/g/pcds/pyps/config/.status/$HUTCH/$IOCNAME.
(This is used to detect IOCs that are still running on a different port than the one they are currently configured to run on.)
Run “st.cmd” to start the IOC!
The Thorn on the Rose
How do we control access to the iocmanager.cfg file to prevent corruption from simultaneous writes?
NFS does have a file locking mechanism, but it has a generally bad reputation.
Therefore, IocManager uses local file locks, so all write access to the configuration must be from a single host!
Authentication and the COMMITHOST
Each configuration file may define a COMMITHOST, which defaults to “psbuild-rhel7-01”.
On startup, IocManager does an ssh to the COMMITHOST. This doubles as an authentication mechanism.
This is probably the #1 cause of IocManager silently failing to start: it cannot access the COMMITHOST from where it is being run, and it is not prompting the user for a password or passphrase.
Summary of the Benefits of IocManager
All information about IOCs are in one file.
Easy to make sure ports are unique on each host.
Easy to update IOC software versions.
Easy to move IOCs from one host to another (especially since code was added to adjust for RHEL5/7!).
Easy to detect IOCs running on the wrong port.
In general, life is easier!
Need to reformat information from the presentation: