microsoft / omi

Open Management Infrastructure
Other
372 stars 116 forks source link

scx v1.9.0-0 --upgrade flag is broken #770

Open edwio opened 2 months ago

edwio commented 2 months ago

Once upgrading existing SCX installation, to SCX v1.9.0-0 using the --upgrade flag: sh ./scx-1.9.0-0.universalr.1.s.x64.sh –-upgrade --enable-opsmgr The installation script is regenerating the omi certificate and key:

warning: /etc/opt/omi/conf/omiserver.conf saved as /etc/opt/omi/conf/omiserver.conf.rpmsave
Generating a 3072 bit RSA private key
 ...................................++
..........................................................................++
writing new private key to '/etc/opt/omi/ssl/omikey.pem'
 -----
 Upgrading package: scx (scx-1.9.0-0.universal.s.x64) ----- Generating certificate with hostname="RHEL7PROD01", domainname="dev" Trying to stop omi with systemctl omi is stopped. Trying to start omi with systemctl omi is started.

Which should not happen, given the use of the --upgrade flag.

JumpingYang001 commented 2 months ago

@edwio we did a change: if current RSA key is 2048 and you manually run scx.sh --upgrade to upgrade omi/scx, it will re-generate the omi certificate and key with 3072 bit RSA key that is more secure than 2048 bit RSA key, you need to re-discover it or upgrade it on console. If you upgrade from OM server console instead of manually upgrade script on Linux box, you will not have the cert sign issue on OM console.

Another way to re-sign the cert on OM server if you have a bunch of Linux boxes, you can try below script,

# Import the Operations Manager module
Import-Module OperationsManager
# Connect to the SCOM management group
New-SCOMManagementGroupConnection -ComputerName omservername.DOMAIN.COM
# Get the list of Unix/Linux computers
$unixComputers = Get-SCOMMonitoringObject -Class (Get-SCOMClass -Name 'Microsoft.Unix.Computer')

$sPassphrase = ConvertTo-SecureString "***yourpassword***" -AsPlainText -Force    
$NewWSCred = New-Object System.Management.Automation.PSCredential ("mydomain\myuser", $sPassphrase)

# Iterate over each Unix/Linux computer and run the Update Certificate task
foreach ($computer in $unixComputers) {
    if($computer.HealthState -eq "Error"){
        $task = Get-SCOMTask -DisplayName "UNIX/Linux Update Certificate Task" 
        if ($task) {
            Start-SCOMTask -Task $task -Instance $computer -TaskCredentials 
            Write-Output "Update Certificate task started for $($computer.DisplayName)"
        } else {
            Write-Output "Update Certificate task not found for $($computer.DisplayName)"
        }
    }
}
edwio commented 2 months ago

@edwio we did a change: if current RSA key is 2048 and you manually run scx.sh --upgrade to upgrade omi/scx, it will re-generate the omi certificate and key with 3072 bit RSA key that is more secure than 2048 bit RSA key, you need to re-discover it or upgrade it on console. If you upgrade from OM server console instead of manually upgrade script on Linux box, you will not have the cert sign issue on OM console.

Another way to re-sign the cert on OM server if you have a bunch of Linux boxes, you can try below script,

# Import the Operations Manager module
Import-Module OperationsManager
# Connect to the SCOM management group
New-SCOMManagementGroupConnection -ComputerName omservername.DOMAIN.COM
# Get the list of Unix/Linux computers
$unixComputers = Get-SCOMMonitoringObject -Class (Get-SCOMClass -Name 'Microsoft.Unix.Computer')

$sPassphrase = ConvertTo-SecureString "***yourpassword***" -AsPlainText -Force    
$NewWSCred = New-Object System.Management.Automation.PSCredential ("mydomain\myuser", $sPassphrase)

# Iterate over each Unix/Linux computer and run the Update Certificate task
foreach ($computer in $unixComputers) {
    if($computer.HealthState -eq "Error"){
        $task = Get-SCOMTask -DisplayName "UNIX/Linux Update Certificate Task" 
        if ($task) {
            Start-SCOMTask -Task $task -Instance $computer -TaskCredentials 
            Write-Output "Update Certificate task started for $($computer.DisplayName)"
        } else {
            Write-Output "Update Certificate task not found for $($computer.DisplayName)"
        }
    }
}

@JumpingYang001, thanks for the clarification, but this isn't a proper solution for production environments, when high privileges passwords aren't in the control of the monitoring team, or network firewall is blocked.

More straight forward approach is to copy the: '/etc/opt/omi/ssl', before the running --upgrade command, and then rewrite the entire "ssl" folder from the backup, and restart the SCX agent, using: 'scxadmin -restart', using automation like (Ansible, Satellite, Chef, etc...)

JumpingYang001 commented 2 months ago

@edwio admin team can install OM console on their box and run the script to make the unhealthy boxes to update the cert to healthy status. the re-sign cert is on OM server Windows side, it cannot do only from Linux side since we also have same 3072 RSA key improve on OM server side. from my understanding, 'rewrite the entire "ssl" folder from the backup' seems be complex to 2048 RSA key to 3072 since they should be signed on Windows server.

edwio commented 2 months ago

@JumpingYang001, it's not a practical solution, at least in my experience (SCOM Admin/Management Pack developer for more than 12 years in large organizations), and I really don't understand this is even suggested as a solution, as I also can suggest the admin team, to use different monitoring agent for Linux based computers. And regarding the workaround I have suggested, I can confirm it's working without any problem, still using the 2048 RSA key.

JumpingYang001 commented 2 months ago

@edwio we want to upgrade to 3072 RSA key and not use 2048 RSA key, do you mean you still want to use 2048 RSA key for upgrading scx/omi?

edwio commented 2 months ago

@JumpingYang001, my current customer, is monitoring more than 900 UNIX/Linux computers, across multiple domains and networks, via SCOM 2019 UR3.

And due to his security team policy, The option of using the Discovery Wizard from the SCOM console to update the SCX agents, isn't possible both by the lack of root passwords being managed only by them, and the fact that network firewall is being blocked, so he must use my suggested solution.

JumpingYang001 commented 2 months ago

@edwio you can use your solution in your automation script as a workaround, and scx/omi >= 1.9.0-0 there is a logic to upgrade the cert to 3072. https://github.com/microsoft/omi/commit/d7a413c3c6aca57b0145bf96a76b65834403f911