ncsa / xcat-tools

Useful tools for xCAT
BSD 3-Clause "New" or "Revised" License
8 stars 0 forks source link

Improve stale file detection in backup-node_configs.sh #30

Open bsper2 opened 2 years ago

bsper2 commented 2 years ago

Ran into an issue in SECURITY-1380 where stateless nodes lost their keytabs.

Currently backup-node_configs.sh will backup files every x days (7 by default). So if a file is modified (like keytabs will when they refresh the 1st of the month) it can take up to 7 days before that file is backed up. This is a problem for stateless nodes, especially ones that reboot weekly.

As a work around I set the REFRESH_DAYS=1 on ngale. This isn't too bad on ngale since there aren't too many nodes, but having it copy all files for all nodes every day might be a bit heavy handed and not ideal especially for larger clusters.

Edit: the above doesn't actually fix this. The logic to see if it should backup the file is based on the modify time of the backed up file ON the xcat node, and does not look at the modify time of the file as it is on the node being backed up:

bkup_tgt="$node_bkup_dir/${tgt_fn}.tgz"     # This is the backup file ON the xcat server
if is_stale "$bkup_tgt" ; then              # Do the is-stale check on the backup file

The best way to force the backup of files every time is to set REFRESH_DAYS=0 or just add the -f flag when executing the script.

Just thinking of ideas, but maybe we setup is_stale so that for each file it compares the backup-copy to the file on the node, and then does the backup only for files that are different. That way a daily run of backup-node_configs.sh would grab more up-to-date files, and since it only copies files that changed it's not copying files needlessly.

Also might adjust the times that the keytabs get updated and when the backup runs. Right now the keytabs update a little after 8AM: 1 8 1 * * sleep $((RANDOM \% 15))m && k5srvutil change and the backup-node_configs.sh runs at midnight daily. So adjusting timings so that the backups happen not too long after the keytab update would be good too.

andylytical commented 2 years ago

A simple solution might be to replace the "tar pipe" with "recursive rsync" at https://github.com/ncsa/xcat-tools/blob/e6afaca43a3d1e61fd9405ac4711d0c43db74134/cron_scripts/backup-node_configs.sh#L61

The resulting solution would likely involve 3 steps:

  1. rsync to a temp location
  2. tar -z the temp contents
  3. mv the new tar file over the old file (this is an atomic operation so it's safe, can't result in data loss)

As an added benefit, this would remove the need for the "isstale" function as well as the "REFRESH*" and "MIN_BKUP_DATE" variables and checks. The resulting code should be shorter and cleaner ... thus easier to maintain. This is a win!

FORCE is probably still a useful cmdline option, but it's logic would change (maybe just delete the local copy; or maybe add a flag to rsync to "skip all checks and copy files"; the latter being safer so a backup isn't lost if the rsync copy fails).

bsper2 commented 2 years ago

SVCPLAN-2056 has been setup to track this