koalaman / shellcheck

ShellCheck, a static analysis tool for shell scripts
https://www.shellcheck.net
GNU General Public License v3.0
36.51k stars 1.78k forks source link

Useless grep | awk #1113

Open tobbez opened 6 years ago

tobbez commented 6 years ago

For new checks and feature suggestions

Here's a snippet or screenshot that shows the problem:

#!/bin/sh
ethtool eth0 | grep 'Speed:' | awk '{print $2}'

Here's what shellcheck currently says:

Nothing.

Here's what I wanted or expected to see:

A suggestion similar to SC2126 (Consider using grep -c instead of grep|wc -l) or SC2002 (Useless cat. Consider ...). Something like "Consider performing matching in awk instead of grep | awk".

The pattern seems fairly common. In most cases it's cleaner to skip the grep invocation and do the matching in awk instead. For the example above, that would be:

#!/bin/sh
ethtool eth0 | awk '/Speed:/ {print $2}'
nihilus commented 6 years ago

This is a bad idea since I dont think busybox/POSIX awk handles extended regexps. Alas I might not 100% sure.

jonhiggs commented 6 years ago

I just checked inside a busybox (v1.28.0) docker container and awk does seem to support extended regex.

/ # echo "hh" | grep -c 'h{2}'
0
/ # echo "hh" | grep -E -c 'h{2}'
1
/ # echo "hh" | awk '/h{2}/ { print $0 }' | wc -l
1
/ #
arth1 commented 6 years ago

I like seeing warnings about unnecessary pipes, but not assumptions about what the solution is, because that can vary.

In this case, I wouldn't recommend using just awk, I would recommend using just sed. ethtool eth0 | sed -n 's/.*Speed: //p'

And in the last example right above this post which pipes to wc -l, there's no reason to. Change the {print $0} to {c++}END{print c}, and there's no need for the pipe.

tobbez commented 6 years ago

The proposed check doesn't really make an assumption about what the solution is. The solution already exists, and the suggestion given by the check is only simplifying that existing solution.

Your recommendation, on the other hand, clearly falls outside of that (in addition to being highly subjective).

The "example" from the post above yours is not an example of a case that would be covered by this check - it was only intended to show that busybox awk supports extended regular expressions.

On the topic of regular expressions (in awk), POSIX has the following to say:

The awk utility shall make use of the extended regular expression notation (see the Base Definitions volume of IEEE Std 1003.1-2001, Section 9.4, Extended Regular Expressions) except that it shall allow the use of C-language conventions for escaping special characters within the EREs, as specified in the table in the Base Definitions volume of IEEE Std 1003.1-2001, Chapter 5, File Format Notation ( '\', '\a', '\b', '\f', '\n', '\r', '\t' , '\v' ) and the following table; these escape sequences shall be recognized both inside and outside bracket expressions. Note that records need not be separated by s and string constants can contain s, so even the "\n" sequence is valid in awk EREs. Using a slash character within an ERE requires the escaping shown in the following table.

In conclusion, and to be clear: the check would only cover pipes where grep is immediately followed by awk.

Dmole commented 6 years ago

what are the alternatives to the awk example? examples of grep in sed/perl/etc. and do they scale?

The first of the following is better on big data;

echo -e "a\nb\nc" | grep b | perl -pe 's/b/d/g'

,

echo -e "a\nb\nc" | perl -p0e 's/[\w\W]*(b)[\w\W]*/$1/g;s/b/d/g'
nihilus commented 6 years ago

@Dmole as any sane person/embedded systems developer knows; PERL should be avoided at all costs.

Dmole commented 6 years ago

A) my point is to consider which tools it would appropriate to integrate grep into.

B) that sort of statement should be backed up with a rational. (like it's the most obfuscate able language ever)

nihilus commented 6 years ago

@Dmole B) the VM / binary everything but lightweight in size or performance on embedded systems. My personal point is that Larry Wall should be followed out in the backyard...

matthewpersico commented 6 years ago

Well I personally love Perl, and, I might add, make a very decent living off of it. I never understood the reason for such hatred. My email is matthew.persico@gmail.com if you care to continue he conversation; let’s not distract this thread.