opencb / hpg-bigdata

This repository implements converters and tools for working with NGS data in HPC or Hadoop cluster
Apache License 2.0
17 stars 14 forks source link

Parallel implementation for converting (VCF to Avro/Parquet) and annotating variants #131

Closed jtarraga closed 7 years ago

jtarraga commented 7 years ago

This framework should provide a parallel implementation for converting (VCF to Avro/Parquet) and annotating variants in order to speed up these processes. To achieve this parallelization, we will use the Parallel Task Runner object implemented in the OpenCB commons-lib.

The variant command line should add a new parameter (-t or --num-threads) to indicate the number of threads to use.