saeyslab / CytoNorm

R library to normalize cytometry data
33 stars 6 forks source link

Clustering fails after CytoNorm #11

Closed vivek-verma202 closed 4 years ago

vivek-verma202 commented 4 years ago

I was able to successfully run FlowSOM on the raw data but failed when the fcs files were normalized using CytoNorm. I believe, this error is because of introductions of NAs. How can I debug / circumvent this?

     sce <- cluster(sce, features = "type",
     +                xdim = 10, ydim = 10, maxK = 10,
     +                verbose = T, seed = 1)
     > o running FlowSOM clustering...
     > Error in SOM(fsom$data[, colsToUse], silent = silent, ...) : 
     > NA/NaN/Inf in foreign function call (arg 1)

By default,

     > options("na.action")
     $na.action
     [1] "na.omit"

I am using: R v4.0.0 FlowSOM v1.20.0 CytoNorm v0.0.5 CATALYST v1.12.1

Thanks a lot! Vivek

vivek-verma202 commented 4 years ago

The issue was:

> sum( is.infinite(assay(sce, "exprs")) )
[1] 79
> mean( colSums( is.infinite(assay(sce, "exprs")) ) > 0)
[1] 5.214521e-06

It was fixed by:

sce <- filterSCE(sce, colSums(is.infinite(assay(sce, "exprs"))) == 0)

Detailed troubleshooting is here

SofieVG commented 4 years ago

Dear Vivek,

Thank you for reporting this issue. While I'm happy you could resolve the issue, the infinities should probably not get generated in the first place. Would you be able to share a small reproducable example where this happens, and/or an RDS file with your normalisation model? I did not manage to reproduce it myself on the datasets I was working with. I remember we changed the type of spline at some point to lower the probability that these splines shoot up to infinity out of the training range, so if it's still occuring maybe I should adapt the code in such a way that these values are reduced to a certain upper limit and a warning is reported to the user. However, it would be nice to have a look at an example model were this happens first before making any adaptations.