xcat2 / xcat-core

Code repo for xCAT core packages
Eclipse Public License 1.0
368 stars 172 forks source link

rcons / goconserver from remote host. #7382

Open bviviano opened 1 year ago

bviviano commented 1 year ago

We have a xCAT 2.16.5 deployment on RHEL8 w/Red Hat High Availabiliy (pacemaker) with 3 xCAT manager nodes in the HA Pacemaker cluster. Everything is working fine by setting the following in /root/.bash_profile on each of the HA nodes:

export XCATHOST=172.20.0.1:3001
export CONSERVER=172.20.0.1

where 172.20.0.1 is the floating IP address manged by Pacemaker, except that when I use rcons from the non-active management node, it connects with

ssh -t 172.20.0.1 /opt/xcat/share/xcat/cons/ipmi a0n13

instead of congo

[root@atmos-mgmt4 ~]# pcs resource status goconserver
  * goconserver (systemd:goconserver.service):   Started atmos-mgmt4

[root@atmos-mgmt4 ~]# echo $CONSERVER
172.20.0.1

[root@atmos-mgmt4 ~]# rcons a0n13
[Enter `^Ec?' for help]
goconserver(2023-05-08T11:54:24-04:00): Hello 172.20.0.4:34740, welcome to the session of a0n13

[root@a0n13 ~]# [Disconnected]
Connection to 172.20.0.1 closed.
[root@atmos-mgmt3 ~]# rcons a0n13
**** Enter ~? for help *****
Acquiring startup lock...done
[SOL Session operational.  Use ~? for help]

[root@a0n13 ~]# Connection to atmos-mgmt3 closed.
[bdk@albert ~]$ 

While it works with /opt/xcat/share/xcat/cons/ipmi, the problem is, the "disconnect" string is ~., the same as the SSH disconnect string, so when I try and disconnect from the rcons session, it also kicks me out of the SSH session to the HA node.

Looking at /opt/xcat/bin/rcons, it seems the only method used to determine if rcons is conserver or goconserver is via checking if goconserver is running or not:

    GOCONSERVER_RC=`service goconserver status >& /dev/null; echo $?`
    if [[ ${GOCONSERVER_RC} == 0 ]]; then
        USE_GOCONSERVER=1
    fi

If I manually change /opt/xcat/bin/rcons so that

USE_GOCONSERVER=1

Then it uses

ssh -t 172.20.0.1 /usr/bin/congo console a0n13

and works as expected:

[root@atmos-mgmt3 ~]# pcs resource status goconserver
  * goconserver (systemd:goconserver.service):   Started atmos-mgmt4

[root@atmos-mgmt3 ~]# rcons a0n13
[Enter `^Ec?' for help]
goconserver(2023-05-08T12:07:35-04:00): Hello 172.20.0.4:41770, welcome to the session of a0n13

[root@a0n13 ~]# [Disconnected]
Connection to 172.20.0.1 closed.
[root@atmos-mgmt3 ~]# 

So, my question is, is there a trick (Environment variable, etc) that I'm not seeing to force rcons to use the congo method, instead of /opt/xcat/share/xcat/cons/ipmi, when connecting from a remote host? I can make the code change to rcons simple enough on my 3 HA nodes, but if there is a better way to do it, I'd like to use that method.

If there isn't a better way, and since all the code to support congo or ipmi cons via SSH is already in rcons, it seems a reasonable RFE to make it so the end user can force the use of conserver / ipmi vs goconserver / congo, instead of relying on pidof and assuming you're running rcons on the same node conserver/goconserver is running on.

Thanks.

samveen commented 1 year ago

Every level of nested remote access adds a ~ to the disconnect string to send. In case you want to disconnect from the rcons connected via ssh, please use ~~. . This also applies to multiple levels of SSH:

For another example, rcons in an ssh console via a jumphost in the middle ( laptop =ssh=> jumphost =ssh=> MN =rcons=> node-69) use ~~~. to disconnect the rcons. Please note that this is a standard of ssh.

bviviano commented 1 year ago

Thanks, that gets me around the ipmi disconnect problem, but I'd love to have a way to specify/force the use of congo instead, since it does work if I just set

USE_GOCONSERVER=1

in the rcons bash script.

samveen commented 1 year ago

Why don't you set it locally and then have ssh send it along with the ssh command, like:

export USE_GOCONSERVER=1
ssh -o SendEnv=USE_GOCONSERVER -t 172.20.0.1 /usr/bin/congo console a0n13

Alternatively:

ssh -t 172.20.0.1 USE_GOCONSERVER=1 /usr/bin/congo console a0n13
bviviano commented 1 year ago

That doesn't really make sense. The USE_GOCONSERVER is a setting inside rcons, if I run congo directly from SSH myself I don't need USE_GOCONSERVER. But running congo as above manually is a lot more complicated then just doing

rcons a0n13
bviviano commented 1 year ago

My goal is to do this the xCAT way, I could write my own simple / stripped down rcons script, or just copy rcons to myrcons, make the code change and then use that going forward, or just change rcons directly and make the same update each time a new version of xCAT is released (we do that now with dhcp.pm and ddns.pm, so xCAT will use something other then MD5 with omshell, so makehosts and makedns will work correctly under FIPS).

But ultimately, it would be nice if there was a way to force it to use congo. There are already overrides in rcons for confluent

CONSOLE_SERVICE_KEYWORD=`tabdump site | grep consoleservice | cut -d, -f1 | tr -d '"'`
CONSOLE_SERVICE_VALUE=`tabdump site | grep consoleservice | cut -d, -f2 | tr -d '"'`

if [ "$CONSOLE_SERVICE_KEYWORD" == "consoleservice" ]; then
    if [ "$CONSOLE_SERVICE_VALUE" == "confluent" ]; then
        USE_CONFLUENT=1
    fi
fi

So CONSOLE_SERVICE_KEYWORD is already being pulled, checked and used to force consoleserver and confluent. A simple addition to the above if statement, like the following

elsif [ "$CONSOLE_SERVICE_KEYWORD" == "goconsoleservice" ]; then
    USE_GOCONSERVER=1
fi

should allow a site table override and to force congo be used, but the only checks I see in rcons for if to use congo are based on checking if the goconserver service is installed AND running

if [ $USE_CONFLUENT != "1" ] && [ -f "/usr/bin/congo" ] && [ -f "/usr/bin/goconserver" ]; then
    GOCONSERVER_RC=`service goconserver status >& /dev/null; echo $?`
    if [[ ${GOCONSERVER_RC} == 0 ]]; then
        USE_GOCONSERVER=1
    fi
    if [[ ${USE_GOCONSERVER} == 1 ]]; then
        CONSERVER_RC=`pidof conserver >> /dev/null; echo $?`
        if [[ ${CONSERVER_RC} == 0 ]]; then
            echo "Error: Both goconserver and conserver are running, please stop one of them, and retry..."
            exit 1
        fi
    fi
fi
samveen commented 1 year ago

That doesn't really make sense. The USE_GOCONSERVER is a setting inside rcons, if I run congo directly from SSH myself I don't need USE_GOCONSERVER. But running congo as above manually is a lot more complicated then just doing

rcons a0n13

@bviviano I see the problem now, after examining the code for rcons for xcat 2.16, which I should have done first, before proposing that non-solution :man_facepalming: