PLINK data model - Githubissues

Separated from #8

One limitation in its genotype model relevant to association analysis is there is no easy way to access slices of data without having to create separate files. For example if I want to analyze my data gene by gene, I'll first have to create over 20K small file bundles (would be 60K files for PLINK bim/fam/bed), then run my analysis.

In contrast, I can potentially slice data from HDF5 file, or even create pre-defined groups of data in HDF5, so that I can later use R to load chunks of data to analyze. This is in fact what I'm going to do now for my eQTL analysis, after I've done basic QC and diagnostics for my data in PLINK -- I'll convert PLINK file boundle to HDF5 with 20K tables each containing information for a gene's expression data with its cis-SNP genotypes. We'll also run into this issue for RVTESTS interface later, which accepts VCF and PLINK PED format, not PLINK binary format.

vatlab / VarStore

PLINK data model #12