krumsieklab / maplet

R statistical toolbox for metabolomics
GNU General Public License v3.0
17 stars 6 forks source link

Refine/Create script for knn imputation of large datasets (>2000 samples) #109

Open KelseyChetnik opened 3 years ago

KelseyChetnik commented 3 years ago

In GitLab by @kelsey.chetnik on Jun 17, 2020, 12:07

Since the knn function is relatively fast for small sample sizes and very slow for large ones (e.g. TwinsUK), it is good to have this functionality. Currently, the script saves individual distance matrices for each variable to be imputed, which takes up a lot of disk space and is thus inefficient. This function can actually replace the regular knn imputation function if implemented correctly. Contact - Parviz