pingcap / tiup

A component manager for TiDB
https://tiup.io
Apache License 2.0
409 stars 304 forks source link

cluster check fails when `/tmp` is mounted with `noexec` #2362

Open dveeden opened 6 months ago

dveeden commented 6 months ago

Bug Report

When running tiup cluster check ... against systems that have /tmp mounted with the noexec option then TiUP fails to run the checks.

  1. What did you do?

See also: https://github.com/ComplianceAsCode/content/blob/80b540816649e3df830691fd39477421ceb8bfea/products/rhel9/kickstart/ssg-rhel9-ccn_basic-ks.cfg#L102

Note that this is included in most security profiles that are available for Rocky Linux/RHEL9/etc:

$ grep -lE '^logvol /tmp.*noexec' * 
ssg-rhel9-anssi_bp28_enhanced-ks.cfg
ssg-rhel9-anssi_bp28_high-ks.cfg
ssg-rhel9-anssi_bp28_intermediary-ks.cfg
ssg-rhel9-ccn_advanced-ks.cfg
ssg-rhel9-ccn_basic-ks.cfg
ssg-rhel9-ccn_intermediate-ks.cfg
ssg-rhel9-cis-ks.cfg
ssg-rhel9-cis_server_l1-ks.cfg
ssg-rhel9-cis_workstation_l1-ks.cfg
ssg-rhel9-cis_workstation_l2-ks.cfg
ssg-rhel9-cui-ks.cfg
ssg-rhel9-ospp-ks.cfg
ssg-rhel9-pci-dss-ks.cfg
ssg-rhel9-stig_gui-ks.cfg
ssg-rhel9-stig-ks.cfg

These profiles are used for generic hardening, compliance with government regulations and compliance with PCI-DSS.

  1. What did you expect to see?

The check to run successfully. Depending on the configuration of the target machine individual checks could still fail, but the check itself would run completely and produce a report.

  1. What did you see instead?
$ tiup cluster check testcluster.yml
tiup is checking updates for component cluster ...
Starting component `cluster`: /home/dvaneeden/.tiup/components/cluster/v1.14.0/tiup-cluster check testcluster.yml
The SSH identity key is encrypted. Input its passphrase: 

+ Detect CPU Arch Name
  - Detecting node 192.168.122.131 Arch info ... Done

+ Detect CPU OS Name
  - Detecting node 192.168.122.131 OS info ... Done
+ Download necessary tools
  - Downloading check tools for linux/amd64 ... Done
+ Collect basic system information
  - Getting system info of 192.168.122.131:22 ... Error

Error: stderr: bash: line 1: tar: command not found
: executor.ssh.execute_failed: Failed to execute command over SSH for 'dvaneeden@192.168.122.131:22' {ssh_stderr: bash: line 1: tar: command not found
, ssh_stdout: , ssh_command: export LANG=C; PATH=$PATH:/bin:/sbin:/usr/bin:/usr/sbin /usr/bin/sudo -H bash -c "tar --no-same-owner -zxf /tmp/tiup/bin/insight-v0.4.2-linux-amd64.tar.gz -C /tmp/tiup/bin && rm /tmp/tiup/bin/insight-v0.4.2-linux-amd64.tar.gz"}, cause: Process exited with status 127

Verbose debug logs has been written to /home/dvaneeden/.tiup/logs/tiup-cluster-debug-2024-01-11-11-04-41.log.

With tar installed (not part of a minimal install of Rocky Linux 9):

$ tiup cluster check testcluster.yml
tiup is checking updates for component cluster ...
Starting component `cluster`: /home/dvaneeden/.tiup/components/cluster/v1.14.0/tiup-cluster check testcluster.yml
The SSH identity key is encrypted. Input its passphrase: 

+ Detect CPU Arch Name
  - Detecting node 192.168.122.131 Arch info ... Done

+ Detect CPU OS Name
  - Detecting node 192.168.122.131 OS info ... Done
+ Download necessary tools
  - Downloading check tools for linux/amd64 ... Done
+ Collect basic system information
+ Collect basic system information
  - Getting system info of 192.168.122.131:22 ... Error

Error: executor.ssh.execute_failed: Failed to execute command over SSH for 'dvaneeden@192.168.122.131:22' {ssh_stderr: bash: line 1: /tmp/tiup/bin/insight: Permission denied
, ssh_stdout: , ssh_command: export LANG=C; PATH=$PATH:/bin:/sbin:/usr/bin:/usr/sbin /usr/bin/sudo -H bash -c "/tmp/tiup/bin/insight"}, cause: Process exited with status 126

Verbose debug logs has been written to /home/dvaneeden/.tiup/logs/tiup-cluster-debug-2024-01-11-11-15-24.log.
  1. What version of TiUP are you using (tiup --version)?
$ tiup --version
1.14.0 tiup
Go Version: go1.21.4
Git Ref: v1.14.0
GitHash: c3e9fc518aea0da66a37f82ee5a516171de9c372

The topology yaml that I used:

global:
  user: "tidb"
  ssh_port: 22
  deploy_dir: "/tidb-deploy"
  data_dir: "/tidb-data"
  listen_host: 0.0.0.0
  arch: "amd64"

pd_servers:
  - host: 192.168.122.131

tidb_servers:
  - host: 192.168.122.131

tikv_servers:
  - host: 192.168.122.131

Note that part of the problematic code is this:

                        Shell(
                                inst.GetManageHost(),
                                filepath.Join(task.CheckToolsPathDir, "bin", "insight"),
                                "",
                                false,
                        ).
                        BuildAsStep("  - Getting system info of " + utils.JoinHostPort(inst.GetManageHost(), inst.GetSSHPort()))

With a quick-and-dirty fix applied the checks now run.

diff --git a/pkg/cluster/task/check.go b/pkg/cluster/task/check.go
index 181f38e5..8b899495 100644
--- a/pkg/cluster/task/check.go
+++ b/pkg/cluster/task/check.go
@@ -42,7 +42,7 @@ var (

 // place the check utilities are stored
 const (
-       CheckToolsPathDir = "/tmp/tiup"
+       CheckToolsPathDir = "/tidb-deploy/tmp/tiup"
 )

 // CheckSys performs checks of system information

Suggestions for a fix: