opencb / hpg-bigdata

This repository implements converters and tools for working with NGS data in HPC or Hadoop cluster
Apache License 2.0
17 stars 14 forks source link

Implement variant association analysis (chi-square, linear/logistic regression) using Spark MLlib #126

Open jtarraga opened 7 years ago

jtarraga commented 7 years ago

The package should provide variant association analysis such as chi-square, linear and logistic regression. Spark's Machine Learning library (MLlib) provides a rich API to implement them in a bigdata environment.

Association tests:

Taking into account the following genetic models: