rawiriblundell / sh_libpath

Making shell scripts more robust with libraries
Apache License 2.0
11 stars 2 forks source link

sh_libpath

Making shell scripts more robust and readable with libraries

:warning: WARNING: This code is very very pre-alpha. I'm currently just dumping in a bunch of functions from my code attic. You're welcome to test, give feedback and contribute, but please don't use this for anything close to production!

TL;DR?

#!/usr/bin/env bash

# Load our init script
. /path/to/init.sh || exit 1

# Import the function libraries that it provides
# Without a given extension, ".sh" is assumed
include sys/os
# Give a specific extension to load a shell specific lib
include utils/git.zsh
# If you have a full path to a library for whatever reason
include /opt/mycompany/some/full/path/to/a/specific/library.sh
# Or just load all libraries under the text libpath
include text

# Check that we have everything we need including shell version, commands and vars
# This means self-documenting code and fail-fast are handled in one line
requires BASH51 git jq curl osstr=Linux

# Lazy-load a config file in case that's something you need
wants /opt/secrets/squirrels.conf

OK... Do you have more information?

This project proposes adding a library ecosystem, primarily for use in shell scripts.

init.sh bootstraps a few environment vars, most importantly:

The proposed library structure caters for both monolithic libraries e.g.

/usr/local/lib/sh/monolithic.sh

As well as hierarchical e.g.

/usr/local/lib/sh/text/string_function.sh

It also adds the following functions:

include

Similar to its python cousin import and/or its perl cousin use, this is intended for loading libraries

requires

This function serves multiple purposes. A lot of shell scripts:

requires() fixes all of that and more.

First, it works through multiple items so you only need to declare it once if you choose to. It would typically be used to check for the existence of commands in PATH like this:

requires git jq sed awk

But it can also check that pre-requisite variables are as desired (or "as required", you might say), for example

requires EDITOR=/usr/bin/vim

It can also check for a particular version of bash, for example to require bash 4.1 or newer:

requires BASH41

It also handles checking full paths for e.g. executable scripts, config files and SH_LIBPATH libraries.

By dealing with this at the very start of the script, we ensure that we fail early. It abstracts away messy command -v/which/type tests and when someone reads a line like:

requires BASH42 git jq storcli /etc/multipath.conf

It's clear what the script needs in order to run i.e. it's self-documenting code.

wants

This function currently deals only with files. You tell it to look at a file, if that file is found it sources it. Otherwise it's not fatal. It's a lazy-loader, in other words.

Some scripts are built with a lot of logic for figuring out weird and tricky things. You can save a lot of repeat processing by wrapping that logic up behind a var, and putting that var into a config file. At the start of your script, you load the config file (or you don't) and it short-circuits various pieces of logic, leading to a more efficient run.

For example, I once proposed this code for the checkmk project, to replace some hairy, hackish and non-portable code they already had:

# GNU 'coreutils' 5.3.0 broke how 'stat' handled line endings in its output.
# This was corrected somewhere around 5.92 - 5.94 STABLE.
# If we detect a version of GNU stat 5.x, we investigate and set MK_STAT_BUG
# See: https://lists.gnu.org/archive/html/bug-coreutils/2005-12/msg00157.html
if var_is_blank "${MK_STAT_BUG}"; then
    # We only care about versions in the range [ 5.3.0 .. 5.94 ] (yes different numbering schemes)
    if stat --version 2>&1 | grep "GNU.*5.[3-9]" >/dev/null 2>&1; then
        # Convert the version information from semantic versioning to an integer
        stat_version=$(semver_to_int "$(stat --version 2>&1 | head -n 1)")
        if [ "${stat_version}" -ge 50300 ] && [ "${stat_version}" -lt 59400 ]; then
            MK_STAT_BUG="true"
        else
            MK_STAT_BUG="false"
        fi
        readonly MK_STAT_BUG
        export MK_STAT_BUG
    fi

fi

# If MK_STAT_BUG is true, we correct 'stat's behaviour globally
if [ "${MK_STAT_BUG}" = "true" ]; then
    stat() {
        printf -- '%s\n' "$(command stat "${@}")"
    }
fi

To prevent that code from being run through at every single invocation of the checkmk agent, one could simply define:

MK_STAT_BUG=false

within a config file. Load that config file early in the agent script, and the majority of that code gets skipped. Otherwise, if it's not short-circuited via a config file, it will just run through normally - no problem, just a bit of wasted processing.

So that's what wants() is about. Originally, at least.

Why

Well, why not?

I have spent large parts of my career fixing other people's shell scripts, and I keep seeing the same mistakes. I keep seeing the same anti-patterns. I keep seeing the same gnashing of teeth "More than 2 lines of shell and I use python" etc. But quite often, shell, and bash specifically, is the appropriate language for the task at hand. It's either the first, the only or the go-to language for many *nix sysadmins, and a lot of Devs approach it with a degree of ignorance.

The thing about people who clamour for any other language is that many of them are blissfully unaware that they're spoiled: all the nitty gritty code that they actually rely on is abstracted away; hidden in libraries/modules that they load up with import or use or some similar loader. With shell, there's no solid ecosystem of libraries to depend on, no pip or CPAN for bash, so most scripts are like inventing the universe every time. This usually leads to the same sub-optimal code being copied and pasted off StackOverflow, and the same sub-optimal practices being spread.

And if you find yourself reading a larger script, it'll usually have a whole bunch of functions - assuming it has some basic semblence of structure. These functions usually take up 80-90% of the code, and this collation of all the code in one file, while potentially enabling immense portability, makes the task of reading the script psychologically more daunting than it needs to be.

I think that until something like the Oil Shell or NGS gains traction, we can abstract a bunch of the common stuff away to libraries and solidify the code. As a result, our scripts will in turn become more robust, safer, easier to read and easier to debug.

Now, there are existing shell library projects out there, but they usually use a license that is too restrictive, or they are so deeply self-referencing that it's nearly impossible to make sense of their code. Some will insist on ridiculous naming conventions: If you're more familiar with another language, you're going to want to call split, not ____awesome__shell_library____+5000____class::::text__split__. Hyperbolic example, sure, but not far removed from what some of these library projects are inflicting on their users. Personally, if I want my code to be that obnoxious to write, I'll switch to PowerShell.

These projects also tend to be Linux and bash 4.0 or newer only. Maybe there'll be the occassional attempt at MacOS compatibility, but that's it.

Why use libraries and not just a package of scripts

That's a good question. The main reasons for using functions over standalone scripts are:

Libraries, insofar as they're presented in this project, are simply collections of functions.

Story time to make this totally relatable

At a former job, I picked up a pre-existing shell script that was somewhat inefficiently written. From memory it was a very -very- long single line script, several thousand characters long, and basically a very long pipeline that was similar to

grep -v something | grep -v something-else | grep -v "another thing" | (this continues on and on and on...)

I put all of the strings-to-be-filtered into an array, re-worked the logic a little and threw in a function named array::join(). So the idea was that the script had something like this towards the top:

strings_to_filter=(
  "something"
  "something-else"
  "another thing"
)

That way, if there were any future changes to that list, adjusting it was straightforward and obvious. Then array::join would smoosh the lot together into a single vertical-bar-delimited string which would be used by a single invocation of grep -Ev. As a result that should shock absolutely nobody: it performed significantly faster.

This function used some shell-native techniques to join the array elements together, and I knew that the code might be considered as a bit obtuse so I added comments briefly explaining what these various techniques did. So I had a cleaner, more obvious, internally documented and performance tested improvement. Well, that PR was not received well at all - this function was apparently proof evident that shell was the indeed the language of the Nazis, that I had brought shame upon the company, and that I should basically just eat my own shoes for having the conceit to propose any minimisation of mediocrity.

Now, on the other hand, had my PR simply been

+ include arrays.sh

... (some time later) ...

+ filter_string=$(array::join '|' "${strings_to_filter[@]}")
+ grep -Ev "${filter_string}" "${some_file}

There would have been no controversy at all, and that would have been the end of it.

If you're a pythonista or a perler, and you're reading this, ask yourself honestly: When was the last time you sat down and looked at the code inside that library that you like? Odds are: "uhh... approximately... never?" is the answer. QED.

Not all hammers are created equal

Whenever someone mentions doing anything remotely constructive about the unix shell, someone invariably trots out Maslow's Hammer.

If the only tool you have is a hammer, you tend to see every problem as a nail.

Abraham Maslow

And some might see this kind of effort as falling into that trap. I don't. Sometimes a nail really is a nail, and suggesting that you hit it with a spanner is idiotic. Nor should we have to tolerate a $5 Walmart hammer when we could have something more like a 20oz Estwing. And, sometimes, the people who are on team spanners really shouldn't talk about hammers and nails because it's not in their wheelhouse. I mean... you can hammer in a nail with a spanner, but it's not pretty...

See, also: Master Foo and the Ten Thousand Lines

Incomplete list of other bash libraries and frameworks

In alphabetical order only for readability:

List of projects that may be inspirational

Want to take it to the next level?