ufs-community / ufs-srweather-app

UFS Short-Range Weather Application
Other
53 stars 114 forks source link

`devclean.sh` demonstrates very unsafe directory removal behavior #1073

Open mkavulich opened 3 months ago

mkavulich commented 3 months ago

Expected behavior

Forced, recursive removal of directories via bash scripts (rm -rf) should be handled very carefully. Any potential mishandling of input, whether it be from unexpected behavior of wildcards, command-line typos, undefined variables, or even intentional malfeasance, can result in very destructive results when the rm -rf command is used.

Current behavior

The devclean.sh has several instances of rm -rf:

The first and last of these include the final boss of bad UNIX practices: they run rm -rf on a variable from a command-line argument that is not checked for existence prior to being used. It is also not invoked in quotes, so any wildcards will pass straight through.

This is extremely unsafe behavior and should be eliminated ASAP

Machines affected

All

Steps To Reproduce

DO NOT TRY TO REPRODUCE THIS. Unless you like loss of data.

Detailed Description of Fix

rm -rf commands should be handled with the caution of a lethal weapon. When they are necessary, they must be crafted very carefully, with explicit checks on the arguments to ensure that they 1. exist, and 2. can only correspond to expected results.

My preferred solution? Delete this script forever and replace it with proper documentation on starting a clean build. All the script does is remove directories, and this is a responsibility that should be handed to users, not handled opaquely by bash scripts.

At the very minimum, if we are to keep this script, it should be restricted to only removing directories within the current tree. The ability to specify custom directory locations (specifically, the --install-dir, --build-dir, --bin-dir, and --conda-dir arguments) is unsafe and unnecessary.

Additional Information

Lest you think my description of this problem is over-dramatic, here is a real-world example from a much more widely used software package: https://github.com/ValveSoftware/steam-for-linux/issues/3671

mkavulich commented 3 weeks ago

Just pinging this high-priority issue after more than two months, can we get someone to work on this? I can not emphasize enough that this script should not be used by anyone in its current state, nevermind recommended in the users guide as a way to solve unexpected build problems!

gspetro-NOAA commented 3 weeks ago

@MichaelLueken May I open a PR to remove the script and the related documentation? It's easy enough to remove the documentation, but I figured a developer would make a decision to remove or revise the script. However, it seems like if we remove it, anyone who has an issue with that could fairly easily restore and update it. In the meantime, it seems prudent to just remove the problem until someone is interested in doing that.

MichaelLueken commented 3 weeks ago

@gspetro-NOAA My concern with removing the script is that @natalie-perlin has noted that this feature was a stakeholder request. While it is most prudent to directly address this issue (opening a PR to remove the script and associated documentation), removing it to add it back in is a waste of resources.

@natalie-perlin - would you be able to clean up the devclean.sh script, or should we go ahead and remove it for the time being?

christinaholtNOAA commented 3 weeks ago

If it's actively dangerous, I'd assume that is decidedly NOT what a stakeholder would want.

Removing it to protect user data seems like a higher priority than being able to resurrect a safer option that likely doesn't look like this script at all.

MichaelLueken commented 3 weeks ago

The script is only dangerous if the user explicitly points to directories outside of their local ufs-srweather-app copy.

I can certainly agree with @mkavulich that removal of user input for BIN_DIR, BUILD_DIR, CONDA_DIR, and INSTALL_DIR and only clear content within the user's local copy will reduce the likelihood of developers accidentally deleting things. I can remove BIN_DIR, BUILD_DIR, CONDA_DIR, and INSTALL_DIR and only remove build, conda, exec, include, and share directories through in the ufs-srweather-app local copy. I can also check for the existence of the directories before removing them, to stay on the safe side. If this will suffice, I can include this fix in my PR #1096.

If this isn't enough, then @gspetro-NOAA, please feel free to remove the devclean.sh script and the documentation related to it.

natalie-perlin commented 3 weeks ago

Let me try to address the concerns of the script, and add any checks that may mitigate the risks.

This is script is especially needed for new users who start exploring the code, and usually go through several build issues. Including more documentation on how to do it by hand is always an option, and it could be done in addition to the script, too.

natalie-perlin commented 2 weeks ago

@mkavulich @MichaelLueken I'm implementing a following function remove_directory as a part of the devclean.sh script, as shown below. It is called when a directory is expected to be removed. The function takes two arguments, $1 is a directory to be removed, and $2 is the main SRW directory. The function checks for a directory existence, whether is contains any wildcards (not removed, suggested user to use a command line for this task), and whether it is under SRW directory tree. If it is not immediately under the SRW tree or outside the tree, a user is asked for a confirmation.

It could be included in my PR-1089, or made as a separate PR.

Please feel free to comment!

# remove_directory function
remove_directory () {
   local dir1=$1
   local srw1=$2

#  check if a directory exists
   if [ -d "${dir1}" ] ; then
     #  check if a directory name contain wildcards
     if [ "${dir1}" = "${dir1//[\[\]|.? +*]/}"  ] ; then
        # check + remove only if a directory is immediately under the SRW directory tree
        if [ "${srw1}" = "$(dirname -- "${dir1}")" ] ; then
          rm -rf "${dir1}" && printf '%s\n' "...removed ${dir1}"
        else
          # a directory is outside the SRW tree, request to confirm its removal
          local basename1=$(basename -- ${dir1})
          local dirname1=$(dirname -- ${dir1})
          printf '%s\n' "You requested to remove ${dir1} that is outside SRW directory tree, please confirm below."
          read -p "Remove a directory ${dir1} (Y/N)? " confirm
          echo "confirm = $confirm"
          case "$confirm" in
            [yY][eE][sS]|[yY])
               rm -rf "${dir1}" && printf '%s\n' "...removed ${dir1}"
            ;;
            *)
               printf '%s\n' "Directory ${dir1} is not removed"
            ;;
          esac
        fi
     else
     # a directory name contains wildcards or special characters, do not remove
       printf '%s\n' "A directory name ${dir1} contains wildcards [ ] \| . ? + *, or a blank character"
       printf '%s\n' "It cannot be removed in this script as a precaution. Please remove it from a command line if needed"
     fi
   fi
   exit 0
}
natalie-perlin commented 2 weeks ago

Updated a #1089, added a modified devbuild.sh with safety checks https://github.com/ufs-community/ufs-srweather-app/blob/1824cec2286c028108dddee916683b99df65302a/devclean.sh

natalie-perlin commented 2 weeks ago

@mkavulich

Sorry for being difficult, but I don't like this solution either. Why do we need to give the users a choice for what directories to delete? If users want to specify some custom directory to delete, they can just delete it! Get rid of these options entirely, simply hard-code the directories and files that need to be deleted. No need to bother with input arguments and checking for wildcards: all of this complexity just leads to more potential failure modes.

Adding a link to comments on the issue: https://github.com/ufs-community/ufs-srweather-app/pull/1089#discussion_r1658838137