Open mortonjt opened 7 years ago
Compiled an inclusive list of state of the art methods to narrow down. The list based on the work/benchmarks for RNA-seq/Genechip/image-processing matrix completion.
Correlation information is derived from the entire data matrix. They assume the existence of a global covariance structure among all genes or samples in the expression matrix. (Liew et al. 2011)
exploit only local similarity structure in the data set for missing value imputation. Only a subset of genes that exhibits high correlation with the gene containing the missing values is used to compute the missing values in the gene. (Liew et al. 2011)
Captures both global and local correlation information in the data. For example in LinCmb, the missing values are estimated by a convex combination of the estimates of five different imputation methods (both global and local) (Liew et al. 2011)
Domain knowledge or external information is integrated into the estimation process (Liew et al. 2011)
[ ] Unknown values to zero (current method)
[ ] Row average method (Base Benchmark)
[ ] PCP (L1) rank reduction + Aitchison matrix norm (novel compositional, will read up on the norm)
[ ] Nonparametric Imputation [Fernandez et al. 2003] (Compositional)
[ ] LRGeomCG Riemannian [Vandereycken (2013)] (Global)
[ ] RMAMR Augmented Lagrangian [Ye et al. (2015)] (Global)
[ ] Gene expression prediction method, (modified PCP method) [Kapur et al. 2016] (Global)
[ ] bi-BPCA [Meng, et al. 2014] (local)
[ ] Bicluster-based least square [Cheng et al. (2012)] (Local)
[ ] EMDI [Pan et al. 2011] (Hybrid)
[ ] POCS [Gan et al. 2006] (Knowledge assisted)
We'll want to have a handful of methods that seem appropriate to benchmark. Still need to finalize the following list