r3fang / SnapATAC

Analysis Pipeline for Single Cell ATAC-seq
GNU General Public License v3.0
300 stars 125 forks source link

Normalization causing degenerate dimensionality reduction #160

Open suragnair opened 4 years ago

suragnair commented 4 years ago

Since the linear model here is allowed to have non-zero coefficients, I encountered cases in which the coefficient was negative.

https://github.com/r3fang/SnapATAC/blob/c3ab177558f0fe9c47cbd68969df7b06de5b07d9/R/utilities.R#L137

As a result the normalized jacquard distances included negative values as well as some outliers. The code removes positive outliers before performing diffusion maps, but in this case the negative outliers ended up making the diffusion maps to output a degenerate solution. This can be fixed by using:

model <- lm(y ~ 0 +x + I(x^2), data) and changing the coefficients to use only beta1 and beta2.