opencb / hpg-bigdata

This repository implements converters and tools for working with NGS data in HPC or Hadoop cluster
Apache License 2.0
17 stars 14 forks source link

Improve 'variant convert' command line #44

Closed jtarraga closed 7 years ago

jtarraga commented 8 years ago

Currently, the 'variant convert' command line contains '--to-avro', '--to-parquet', '--to-json',... parameters, they can be grouped in one parameter --to {avro | parquet | json}. In addition, the command line should offer some basic filters (e.g., region filter, quality filter,...) and it lacks of some important Parquet parameters (e.g. page size, row group size) to take into account:

./hpg-bigdata.sh variant convert -i .... -o .... --to {avro | parquet | json} [--from {vcf|avro}] --compression {deflate | snappy | ...} --data-model {opencb|ga4gh} [--page-size ...] [--row-group-size ...] [--include-formats ....] [--region ....] [--region-file ....] [--qual ....] [--filter ....] [--only-with-id]