greenelab / model-free-data

Case-control genetics datasets evolved to be epistatic
https://doi.org/b44mk9
Creative Commons Zero v1.0 Universal
5 stars 2 forks source link

List of studies that have used this data #2

Open dhimmel opened 8 years ago

dhimmel commented 8 years ago

This goal of this issue is to compile a list of studies that have used our model free data.

dhimmel commented 8 years ago

Learning Vector Quantization with Local Adaptive Weighting for Relevance Determination in Genome-Wide Association Studies Flavia R B Araujo, Hansenclever F Bassani, Aluizio F R Araujo Neural Networks (IJCNN) (2013) DOI: 10.1109/IJCNN.2013.6707040

To evaluate the scalability of DSEL-LVQ we considered the datasets described in [22] with interactions of three, four and five SNPs. However, these datasets included only the relevant SNPs, therefore, to produce datasets with 20, 50 and 100 SNPs, we selected randomly 800 and 1600 individuals from [22] and combined them with irrelevant SNPs (noisy data) randomly selected from [21]. This resulted in 100 data files for each combination of population sizes, numbers of interacting SNPs and number of irrelevant SNPs, all equally balanced in cases and to controls.

dhimmel commented 8 years ago

Cuckoo search epistasis: a new method for exploring significant genetic interactions M Aflakparast, H Salimi, A Gerami, M-P Dubé, S Visweswaran, A Masoudi-Nejad Heredity (2014) DOI: 10.1038/hdy.2014.4

We also used Himmelstein data sets with three to five functional SNPs, which had been generated with no predefined genetic models, to evaluate methods in identifying higher order interactions. For any interaction order, the data folders consisted of 100 data sets each having 1500 cases and 1500 controls for a SNP number as high as the considered interaction order. Assuming Hardy-Weinberg equilibrium proportions and MAF of 0.5, we randomly generated additional SNP data to embed with the Himmenstein data using a multinomial distribution. After embedding Himmelstein data with our generated data sets, the resulting data sets for any interaction order contained 1000 SNPs for 3000 samples. These data sets are available online from http://discovery.dartmouth.edu/model_free_data/.

dhimmel commented 8 years ago

CINOEDV: a co-information based method for detecting and visualizing n-order epistatic interactions Junliang Shang, Yingxia Sun, Jin-Xing Liu, Junfeng Xia, Junying Zhang, and Chun-Hou Zheng BMC Bioinformatics (2016) DOI: 10.1186/s12859-016-1076-8

For assessing the capability of CINOEDV in inferring higher order epistatic interactions from the epistasis hypergraph, four models are used that have been developed previously [49, 50], namely, Three − 1, Three − 2, Four and Five. Three − 1 is a model of 3-order epistatic interaction displaying both marginal effects and interaction effects. Three − 2 is a pure model of 3-order epistatic interaction, where the association to the phenotype is only observable when all 3 ground-truth SNPs are considered together, that is, no main effects and no pairwise epistatic interactions. Similarly, Four and Five are models of 4-order and 5-order epistatic interactions, each displaying no main effects and no 2-order interaction effects. For each corresponding data set also generated by epiSIM [44], 1500 cases and 1500 controls are included and genotyped by 1000 SNPs.