perishky / ewaff

Artistic License 2.0
3 stars 0 forks source link

Faster outlier handling #4

Open perishky opened 1 year ago

perishky commented 1 year ago

Most computation for handling outliers happens in matrixStats::rowQuantiles(). This function is solidly written in R, but could be faster if implemented in C and/or if it made use of parallel processing.

WGCNA has a C implementation but would rather not take on all WGCNA dependencies!

e.g. iqr.trim<-function(methylation, iqr.limit=3) {

find outlying observations more extreme that the iqr.limit

quantiles <- matrix(NA, ncol=2, nrow=nrow(methylation))
quantiles[,1] <- WGCNA::rowQuantileC(methylation,0.25)
quantiles[,2] <- rowQuantileC(methylation,0.75)
##quantiles <- matrixStats::rowQuantiles(methylation, probs = c(0.25, 0.75), na.rm = T)
iqr <- quantiles[,2] - quantiles[,1]
low <- quantiles[,1]
upper <- quantiles[,2]

initial.missing <- rowSums(is.na(methylation))

methylation[methylation < low - iqr.limit * iqr] <- NA
outliers.lower <- rowSums(is.na(methylation)) - initial.missing

methylation[methylation > upper + iqr.limit * iqr] <- NA
outliers.upper <- rowSums(is.na(methylation)) - initial.missing - outliers.lower

n <- ncol(methylation) - initial.missing - outliers.upper - outliers.lower
log <- data.frame(outliers.lower, outliers.upper, n) 

return(list(methylation=methylation, log=log)) 

}