shenwei356 / rush

A cross-platform command-line tool for executing jobs in parallel
https://github.com/shenwei356/rush
MIT License
866 stars 63 forks source link

Call new feature: Native Support for perl-like commands #6

Closed bioinformatist closed 6 years ago

bioinformatist commented 6 years ago

Hi Zhua Brother :smile: ,

As known, short shell scripts like Perl one-liners, awk, and other programming language who can bring us commands in single line are convenient for bioinformatics work. We just take little time to write one (about 50 seconds to several minute), then call rush with these commands embedded for batch processing. But up to now, we need escape the $ character again and again (it may come to more than 10 times in a one-liner), for instance:

grep ">" ../databases/2clusters.fa | cut -c2- | rush "perl -lanE'(\$x0) = map{/X0:i:(\d+)/}\$_; (\$x1) = map{/X1:i:(\d+)/}\$_; say if \$F[2] eq qq[{}] && \$x0 == 1 && \$x1 == 0' mapped.sam > {}.sam"

All $s, no matter it represents for special variable $_, or slice of array, need to be escaped. Only in this way can we get right statements with --dry-run for preview. Also see this one:

all=`wc -l < ../mapped.sam`; ls | grep sam | rush "echo \"{.}\t\$(printf '%0.2f' \$(echo \"scale=2; \`wc -l < {}\`*100/$all\" | bc))%\t(\`wc -l < {}\`/$all)\""

Sometimes we need get return value from a Linux command like wc, but though we run rush with --dry-run switch, it also will be run at the time. So we also need escape these back-quotes manually, which make the combined command finally hard to read. Also, rush has no support for Eskimo symbol (}{) now. Indeed, most exceptions is caused by braces. The }{ symbol is so important that it help us jump out the while loop implicitly when we use -n switch in Perl one-liner. The right part of the braces is like the end of while loop block, while the left part is play a role as an anonymous block out of the while block. So in this new block, we can summary the hash, or do other operations that we only want to perform once. See this example:

echo -e 'Sample with Virus\tRead number'; cat fq_ref | rush -k "perl -F'\t' -lanE'\$fuck{\$F[0]}++; END{say qq{\$ARGV\t@{[~~keys %fuck]}}}' {1}_{2}.sam"

If not for using in rush, the Perl one-liner could be:

perl -F'\t' -lanE'$fuck{$F[0]}++; }{ say qq{$ARGV\t@{[~~keys %fuck]}}' {1}_{2}.sam

The Eskimo symbol will work well, we don't need heavy END{} any more. The braces also can be boundary of some tags (common "" in one-liners may cause panics), like qq{string in double-quotes}, qw{string in single-quotes}, qx{regex} and so on. Although Perl's syntax is so flexible that we can replace braces with other paired symbols like **, [], etc., but braces is still the most frequently used one of many people.

So can you bring us a new feature, that supports auto-escaping these symbols ($, back-quote, and {}, place a real back-quote in markdown here will resulting in wrong format) with a new switch like --perl (of course the name is defined by yourself :100: )?

shenwei356 commented 6 years ago
bioinformatist commented 6 years ago

Yep, it is well supported in the latest version, thanks.