related-sciences / gwas-analysis

GWAS data analysis experiments
Apache License 2.0
24 stars 6 forks source link

Determine core operations necessary in general-purpose GWAS toolkits #16

Closed eric-czech closed 4 years ago

eric-czech commented 4 years ago

As a complement to https://github.com/related-sciences/gwas-analysis/issues/15, this spreadsheet shows the functionality present in several toolkits as well as properties of the data structures that back them.

This also shows which libraries do or do not have "core" features, according to what I've seen to be most common in the workflows I've been studying. Particularly notable capabilities and deficiencies are also pointed out which should make it easier to build a narrative about the landscape.

This issue should be closed out once we're convinced nothing important is missing.

eric-czech commented 4 years ago

Note to self: add GENESIS

hammer commented 4 years ago

I wonder if a comparison to the most popular pipeline of single-purpose tools is worth including as well? Something like https://github.com/kerimoff/qcnorm, for example.

Also GCTA might be considered a toolkit by this point given its growth in scope.

eric-czech commented 4 years ago

cf. https://discourse.smadstatgen.org/t/core-operations-in-human-gwas-workloads/41

Looking at qcnorm briefly:

For qcnorm and TOPMed, the general order of operations in the pipelines squares with everything else I've seen -- it's basically the same stuff accomplished in completely different ways. I think we're unlikely to be surprised by anything on this front now until we get into processing WES data or normalizing phenotypes ourselves.