puppylinux-woof-CE / woof-CE

woof - the Puppy builder
GNU General Public License v2.0
389 stars 278 forks source link

Coding style and PR etiquette. #1715

Open 01micko opened 4 years ago

01micko commented 4 years ago

coding style

There is plenty of messy shell code in woof. @wdlkmpx and myself (and others) over the years have attempted to clean it up somewhat but if someone commits code with a mix of tabs and spaces then we end up back at square :one:

basic guidelines

If a script has spaces ( 1 or 2 ) as the indent then follow that. If a script has tabs as the indent then follow that. Please don't mix them up.

code quality

EG:

if [ "`which program`" != "" ] ; then 
  exec program
fi

Let's analyze this

  1. we perform a conditional test
  2. we call an external program - which in a subshell
  3. we compare it to "" (essentially nothing)
  4. the result is evaluated and we proceed with the block or skip the block depending on result.

So what could go wrong?

User or builder installed GNU which giving vastly different output than busybox which

Let's try it another way

if type program >/dev/null 2>&1 ; then
  exec program
fi

So what happens here?

  1. call the bash/ash/dash internal type to determine if program is installed and suppress the output
  2. if type is successful the condition is satisfied and we proceed or skip as before

Can we see that this is faster? In a large script that has to test perhaps for several programs to determine which one to use then this will have a significant speed boost.

Imagine this on a pi Zero

browser=''
# choose first available browser
for B in firefox seamonkey chromium google-chrome opera vivaldi links lynx midori 
do
  # [ "`which $B`" != "" ] && browser=$B && break # too slow
  type $B >/dev/null 2>&1 && browser=$B && break
done

Basically, bash and ash are very powerful and can speed up your code if used diligently. Of course sometimes we can't get away from using external calls, especially sed, grep and awk. I myself am no expert in shell internals and I often refer to ash or bash man pages; there are many to be found online. Redirections can also be very powerful as well as the operators. Take time to familiarise yourselves more with bash and ash and you will be pleased with the results. Happy coding :smile:

pull requests

If you have a pull request that requires that new programs need to be installed in all PKGS_SPECS_TABLE s variables (DISTRO_PKS_SPECS-$DISTRO_BINARY_COMPAT-$DISTRO_COMPAT_VERSION file), then please make that one pr rather than a separate pr for each version.

Please make a better comment about the PR. You have to sell the idea to the maintainer to convince her/him to test it and then merge it. Doesn't matter if it's a 200 word or more essay! 5 or six words is not enough to convince me. Remember, these go into the git history so it is much easier to find certain commits if something goes wrong or the the commit needs enhancement. 20 to 30 words is usually enough, just explaining why it is needed and what it affects.

Also, if it is relevant, include in the commit message the relevant related commits (just the first seven chars of the sha commit number) or the issue number, especially if it closes the issue eg:

fix such and such bug, closes #9876

this fixes the bug where the computer jumped out the window and ran off with my motorbike. necessary because I like my motorbike

Don't let this discourage you from sending in PRs! Don't be discouraged if it gets rejected either, there will be a reason.

s243a commented 4 years ago

My preference is for two spaces rather than a tab the default spacing for most text editors is that "1 tab" equals "four spaces" and unless you have a giant screen (or high resolution and good eyes), it doesn't take much nesting to use you your screen width.

I do see though that geany has some automatic settings, that will make this easier for me, to be hopefully be consistent with others. Under the document menu, I can set the following: Indent Type -> Detect from document #This should address the above: Indent With -> "2" # This should make me happy.

Or if a document uses spaces rather than tabs are we particular about how many spaces to use?

s243a commented 4 years ago

Regarding "which" vs "type", I think THAT "which" is more readable because it is a command that most people use. That said I see the merit in writing portable code. My question is why would the GNU version of which behave differently than the busybox version. Shouldn't errors be printed to standard error and not captured by quotation marks.

The non obvious thing about which, is that it won't find a program in the path if it isn't executable (e.g. a broken symlink).

01micko commented 4 years ago

if you prefer to use which suppression of all output would be preferable; just replace type with which in my example. Would be faster than the test/comparison, as we just rely on exit code.

That said, quality of code is important and understanding of the shell you are using certainly can increase the speed and quality. Don't worry, some of my code isn't that great either but I have been making a concerted effort to improve it.

s243a commented 4 years ago

if you prefer to use which suppression of all output would be preferable; just replace type with which in my example. Would be faster than the test/comparison, as we just rely on exit code.

That said, quality of code is important and understanding of the shell you are using certainly can increase the speed and quality. Don't worry, some of my code isn't that great either but I have been making a concerted effort to improve it.

Good point, regarding the speed difference of comparison vs exit codes.

sc0ttj commented 4 years ago

Yeah I'm a terrible offender for this - calling sub shells, and checking output when return codes will suffice..

But I always use 2 spaces for indentation, as it works best on various setups .. (tabs looks huge on github/web).

...I will go through Pkg at some point and do a PR which is soley focused on reducing sub shells and using builtins as much as possible to speed it up.

An example of using Bash builtins only

There is a great project called fff (f***ing fast file manager) which is a great example of using Bash 3.2 builtins only. https://github.com/dylanaraps/fff

sc0ttj commented 4 years ago

Fixing indentation

I've never heard of it till just now, but there is apparently the expand command, which can be used to fix indentation. From https://stackoverflow.com/a/11094620/5479837 :

find . -name '*.java' ! -type d -exec bash -c 'expand -t 4 "$0" > /tmp/e && mv /tmp/e "$0"' {}

Another example from the same page, this time using the sponge command from moreutils:

find ./ -iname '*.java' -type f -exec bash -c 'expand -t 4 "$0" | sponge "$0"' {} \;

Explanation:

./ is recursively searching from current directory -iname is a case insensitive match (for both .java and .JAVA likes) type -f finds only regular files (no directories, binaries or symlinks) -exec bash -c execute following commands in a subshell for each file name, {} expand -t 4 expands all TABs to 4 spaces sponge - soak up standard input (from expand) and write to a file (the same one)*.

Geany settings

Maybe we could update the default settings in Geany, so that all Pups have Geany setup to use two spaces by default, rather than to respect files existing indentation...?

Just a thought.. That is how I set it up on mine (for working on my own stuff).

Other editors

Sublime-Text 3

This is an excellent text editor, available in most repos. Can easily fix indentation using the editor, or various indentation plugins.

Micro

FYI, I use micro (https://github.com/zyedidia/micro/) a terminal based editor, with modern stuff like linters, formatters, multiple select, modern (VS code or sublime-like) key bindings, etc. I rarely use Geany these days.

Micro can be setup to use prettier and a bunch of other formatters that properly formats your code on save... Or you can manually set these things in ~/.config/micro/settings.json, on a per-filetype basis.

sc0ttj commented 4 years ago

If we do need sub-shells

Should we always be using $() instead of back ticks?

The major benefit is the ability to nest $() commands within commands, without losing your sanity trying to figure out if some form of escaping will work on the backticks.

$() is POSIX and supported by all modern Bourne shells, e.g. ksh, bash, ash, dash, zsh, busybox, you name it

Here's something I didn't know about the difference:

Also, a note about using $() and backticks in aliases. If you have alias foo=$(command) in your .bashrc then command will be executed when the alias command itself is run during .bashrc interpretation. With alias foo=command, command will be executed each time the alias is run. But if you escape the $ with the $() form (e.g. alias foo=\$(command)), it too will execute each time the alias is run, instead of during .bashrc interpretation. As far as I can tell by testing, anyway; I can't find anything in the bash docs which explain this behavior.

sc0ttj commented 4 years ago

Redirecting or suppressing output

What is preferred?

foo >/dev/null 2>&1

# or 

foo &>/dev/null

Because I personally like doing it this way, as I find it's the only syntax I ever remember:

 # supress all output
foo &>/dev/null

# suppress stdout
foo 1>/dev/null 

# suppress errors
foo 2>/dev/null

Any reasons not to use this simpler syntax? It works in Bash and Busybox Ash.

Preferred shells

Should we prefer ash to bash?

I would say yes.

Busybox ash has long had lots of "Bashisms" available, like ${foo/bar/baz}, etc.

I've used Bash arrays a lot inmdsh, and tbh, I'm not that keen...

Overall, I don't see much benefit to using Bash over Ash, except that the read builtin in Bash has more options (-i.. see ppa2pup script in Pkg.. IIRC).

Hashbangs

Should we define which shell we use explicitly at the top of our scripts?

#!/bin/bash

and

#!/bin/ash

...rather than

#!/bin/sh

?

jamesbond3142 commented 4 years ago

Suggestions, if I may.

  1. Should we prefer ash to bash?

See response to next question.

  1. Should we define which shell we use explicitly at the top of our scripts?

Use #!/bin/sh if you are absolutely sure that your script can run in any kind of bourne-compatible shell. This include bash, ash, dash, zsh, and a few others.

If your script uses bashim, then use #!/bin/bash explicitly.

In Fatdog we uses #!/bin/dash for most scripts (dash is statically compiled) to ensure that it is small and fast to launch. "dash" is the closest you can get to the original bourne shell, so if it runs in dash it would (almost) be sure to run in other shells too. However, it doesn't have a lot of features; we have to resort to calling "sed" or "awk" to do anything other than the simple things. This may defeat the purpose because calling an external program is always slower than executing shell builtins, so depending on what your script is doing, "bash" may be better than "dash" (and indeed, we have a few scripts which explicitly specify #!/bin/bash for this reason).

If you need something beyond what "dash" can do but does not need the entire "bash" support and still want to be reasonably fast, "ash" is a good choice. But beware. This is not the case in Puppy, but "busybox ash" can actually be compiled without support for bashism - so if you want to use "#!/bin/ash" that uses bashim, then make an explicit note about it so someone else who reads your script knows what to expect.

  1. Redirecting or suppressing output
foo >/dev/null 2>&1
foo 1>/dev/null
foo 2>/dev/null

is standard. The rest is bashism. Use anything you like, but see the note about hashbang above.

  1. Should we always be using $() instead of back ticks?

Major benefits of $() are: a. It's easier on the eyes and less likely to be mis-read b. It can be nested.

  1. Other editors.

No comment needed. Preferred editors are purely subjective taste.

  1. Geany settings.

No comment needed. You all just need to decide/agree.

  1. which/type

"type" is a shell built-in which is supported in almost all known shells (dash,ash,bash) and is definitely faster. If you need to find several programs the difference will stack up. In my earlier scripting I always used "which" but after I discovered "type" I abandoned it almost completely.

  1. Indentation.

I personally uses 4-space tab indentation. It makes code oh so much easier to read and follow. One-space indentation is the same as no indentation. I see many other (non-Puppy) codes that indents at two-spaces. I don't like it, but that's just me.

sc0ttj commented 4 years ago

Using Shellcheck to improve your shell scripts

# shellcheck

Usage: shellcheck [OPTIONS...] FILES...
  -e CODE1,CODE2..  --exclude=CODE1,CODE2..  exclude types of warnings
  -f FORMAT         --format=FORMAT          output format
  -C[WHEN]          --color[=WHEN]           Use color (auto, always, never)
  -s SHELLNAME      --shell=SHELLNAME        Specify dialect (sh,bash,dash,ksh)
  -x                --external-sources       Allow 'source' outside of FILES.
  -V                --version                Print version information

Examples:

- Check a shell script:

  shellcheck file.sh

- Override script's shebang:

  shellcheck --shell sh|bash|ksh file.sh

- Ignore certain errors:

  shellcheck --exclude SC1009 file.sh

- Ignore multiple errors:

  shellcheck --exclude SC1009,SC1073 file.sh

Example of shellcheck /usr/sbin/pkg:

In /usr/sbin/pkg line 27:
APPTITLE="$APPNAME $APPVER"
^-- SC2034: APPTITLE appears unused. Verify it or export it.

In /usr/sbin/pkg line 33:
SELF=$(basename $0)        # this script
                ^-- SC2086: Double quote to prevent globbing and word splitting.

In /usr/sbin/pkg line 59:
[ -z "$PKG_PPA2PUP_FN" ] && if [ ! -z "`which gawk`" ]; then PKG_PPA2PUP_FN=ppa2pup_gawk; else PKG_PPA2PUP_FN=ppa2pup; fi
                                       ^-- SC2006: Use $(..) instead of legacy `..`.

In /usr/sbin/pkg line 77:
  [ -f /tmp/pkg/error130 -a "$2" = '130' ] && exit "$2"
                         ^-- SC2166: Prefer [ p ] && [ q ] as [ p -a q ] is not well defined.

 ...

In /usr/sbin/pkg line 294:
  [ -f $TMPDIR/func_list ] && cat $TMPDIR/func_list | sort && exit 0
                                  ^-- SC2002: Useless cat. Consider 'cmd < file | ..' or 'cmd file | ..' instead.

...

In /usr/sbin/pkg line 368:
  pkg_version=`echo $PKGNAME | sed -e "s/^${PKGNAME_ONLY}_//g"`
               ^-- SC2001: See if you can use ${variable//search/replace} instead.

...

In /usr/sbin/pkg line 508:
  pkg_name_only="`echo "$(basename "${1}")" | sed -e 's/-[^-][^+][^a-zA-Z][0-9.]*/@/'`"
                  ^-- SC2001: See if you can use ${variable//search/replace} instead.
                       ^-- SC2005: Useless echo? Instead of 'echo $(cmd)', just use 'cmd'.

...

In /usr/sbin/pkg line 2045:
  [ "$1" ] && get_repo_info "$1" || . ${PKGRC}
           ^-- SC2015: Note that A && B || C is not if-then-else. C may run when A is true.

...

In /usr/sbin/pkg line 3199:
        cd -
        ^-- SC2164: Use cd ... || exit in case cd fails.
        ^-- SC2103: Use a ( subshell ) to avoid having to cd back.

...

In /usr/sbin/pkg line 3280:
      if [ "`find /tmp/* -iname $1*.t*`" != '' ];then
                                ^-- SC2061: Quote the parameter to -iname so the shell won't interpret it.

In /usr/sbin/pkg line 3351:
  build_number="-$(($build_number + 1))"
                    ^-- SC2004: $/${} is unnecessary on arithmetic variables.

In /usr/sbin/pkg line 3370:
  rm -rf "${CURDIR}/${PKGNAME}/" 2>/dev/null
         ^-- SC2115: Use "${var:?}" to ensure this never expands to / .

...

In /usr/sbin/pkg line 4793:
         [ -z $HOME/.packages/${PKGNAME}.files ];then
                   ^-- SC2157: Argument to -z is always false due to literal strings.

You can go through the script, excluding more and more SCnnnnn stuff, and tackle them bit by bit... Lots of work to do in Pkg :/

Checking for specific "code smells" or problems

Each individual ShellCheck issue has its own wiki page, like these: SC1000, SC1001, ..., SC2236.

GitHub Wiki does not support enumerating them on a page, so to find the one you're interested in, you can do any of the following:

  • Use the "> Pages" menu on the right (desktop) or below (mobile) and search for the error code, e.g. SC2086.
  • Visit an arbitrary issue and edit the URL in your browser.
  • Trigger the issue on shellcheck.net and click/tap the issue id.

Example pages:

Customising what shellcheck looks at

See https://github.com/koalaman/shellcheck/wiki/Directive

Ignoring one specific line in a file:

Use a directive to disable a certain line:

hexToAscii() {
  # shellcheck disable=SC2059
  printf "\x$1"
}
01micko commented 4 years ago

Some nice comments :smiley:

As for indentation, my main point was that if you edit a script in a commit keep the indentation the same as the original.

I personally dislike 1 space indentation for the same reason James stated above. In init in Puppy's initrd.gz 1 space is used and in large nested if statements it can be a challenge finding the end of one, especially on a small screen or in a console editor. (which reminds me I have to reinstate busybox vi at some point).

After this thread has been going for a while I'll close the issue and glean the best ideas for inclusion for a readme or even make a wiki entry.

Keep 'em coming!

sc0ttj commented 4 years ago

Maybe this readme could include stuff about using standardised or consistent directories and locations for scripts, config files, etc.

Managing $HOME

It might be nice to have in a readme (developer handbook?) info about managing $HOME properly:

System wide settings vs users settings

And also suggest putting config files in /etc/<program-name>/, until after first launch of program-name, so they get copied to $HOME, no matter where it is, or which user runs the program.

Managing shared assets

Maybe we can also encourage users to put icons, shared scripts, config files, in /usr/share/<program-name>/, rather than /usr/lib/.. Any better places?

Using /tmp and /var dirs properly

Other system directories

Maybe some info like:

Your programs should not normally write or change files in these directories:

dimkr commented 3 years ago

Commit messages in this project are horrible! Many are Update x or Create x (maybe because people upload and edit files straight through GitHub instead of using an IDE or git commit). They're not informative and sometimes a single commit changes multiple, unrelated things. And in other cases, identical or related changes to multiple files (like mass find/replace on .desktop files) are one commit per file, and that's messy too.

(And IMHO coding style, etc' should go to a CONTRIBUTING.md file or something that's part of the repo.)

01micko commented 3 years ago

(And IMHO coding style, etc' should go to a CONTRIBUTING.md file or something that's part of the repo.)

There's an idea.