zjshi / gt-pro

MIT License
23 stars 7 forks source link

streamline output naming, more detailed -h usage, fix mac os x build #28

Closed boris-dimitrov closed 4 years ago

boris-dimitrov commented 4 years ago
GTPro version 2.0
For copyright and licensing information, please see
https://github.com/zjshi/gt-pro2.0/blob/master/LICENSE

ARGUMENTS: 
  -d <sckmerdb_path: string> 
  -t <n_threads; int; default CPU_count>
  -o <out_prefix; string; default: cur_dir/%{in}__gtpro__%{db}>
  -l <number of index address bits; int 28..32; default: depends on machine RAM>
  -m <bloom filter address bits; int 30..36; default: depends on machine RAM>
  -h <display this usage info>
  -f <force overwrite of pre-existing outputs>
  -C <in_prefix; string; default: none>
  [input0, input1, ...]

WHERE

  input1, input2, ... are files in FASTQ format, optionally compressed,
  and optionally in the dir specified by -C, which may be an s3 bucket

  when no inputs are specified, gt_pro consumes fastq input from stdin
  until stdin reaches EOF, then emits all output to stdout at once

  in the optional -o output prefix, %{db} expands to the DB name,
  %{in} expands to the corresponding input base name, and %{n} expands
  to the corresponding input number 0, 1, 2, ..., if input != stdin

  -f causes any pre-existing output files to be overwritten

USAGE EXAMPLES

  The following two methods of running gtpro produce equivalent results.

  Method 1:
    gt_pro -d /path/to/db1234 -C /path/to/input test576/r1.fastq.lz4 test576/r2.fq.bz2

  Method 2:
    lz4 -dc /path/to/input/test576/r1.fastq.lz4 | gt_pro -d /path/to/db123 | lz4 -c > test576_r1__gtpro__db1234.tsv.lz4
    lbzip2 -dc /path/to/input/test576/r2.fq.bz2 | gt_pro -d /path/to/db123 | lbzip2 -c > test576_r2__gtpro__db1234.tsv.bz2

  The primary difference is in performance and error handling.  Method 1 will create an
  .err file for any input that fails, and will better utilize all available CPU cores.

  To obtain simple sequential output names like out.0.tsv, out.1.tsv, ...
  with forced overwriting of existing outputs, use arguments -f -o out.%{n}