opencb / hpg-bigdata

This repository implements converters and tools for working with NGS data in HPC or Hadoop cluster
Apache License 2.0
17 stars 14 forks source link

Support for PLINK, a third-party analysis tool for variants #128

Open jtarraga opened 7 years ago

jtarraga commented 7 years ago

The project should support third-party analysis tools.

PLINK is a free, open-source whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner. The focus of PLINK is purely on analysis of genotype/phenotype data, so there is no support for steps prior to this (e.g. study design and planning, generating genotype or CNV calls from raw data).

Some PLINK executions may take long, this time should be reduced significantly by running on a Hadoop cluster and using Spark engine.