hansenlab / minfi

Devel repository for minfi
58 stars 67 forks source link

preprocessQuantile getSex error: -Inf median CN #220

Open zxl124 opened 3 years ago

zxl124 commented 3 years ago

I was trying to run preprocessQuantile on a signal intensity data (no IDAT data available). The signal intensities are read to make a MethylSet object. When I run preprocessQuantile on the MethylSet object, I got this error:

> preprocessQuantile(mSet)
[preprocessQuantile] Mapping to genome.
Error in do_one(nmeth) : NA/NaN/Inf in foreign function call (arg 1)
In addition: Warning message:
In preprocessQuantile(mSet) :
  preprocessQuantile has only been tested with 'preprocessRaw'

> traceback()
6: do_one(nmeth)
5: kmeans(dd, centers = c(min(dd), max(dd)))
4: .getSex(CN = CN, xIndex = xIndex, yIndex = yIndex, cutoff = cutoff)
3: getSex(object)
2: addSex(object)
1: preprocessQuantile(mSet)

So I read the code in preprocessQuantile and getSex, and was tracing the code step by step. I found the problem being the median CN of some sample was -Inf.

xMed <- colMedians(CN, rows = xIndex, na.rm = TRUE)
yMed <- colMedians(CN, rows = yIndex, na.rm = TRUE)
> xMed
[1] 13.23967 13.19329 12.52797
> yMed
[1]  1.584963      -Inf 12.704444

When more than half of the CN values are -Inf, the median is -Inf. I looked at how CN values are calculated.

function(object, ...) log2(getMeth(object) + getUnmeth(object))

For this specific data, a lot of X,Y chromosome probes have 0 methylated and unmethylated values, therefore results in -Inf CN values. I am thinking maybe add 1 to getMeth(object) + getUnmeth(object) would solve this problem.

This bug has been reported in the comments of #172.

Mike-L-V commented 2 years ago

I've come across the same issue but I'm using IDAT files. Using the information provided above I was able to confirm that it was due to the same problem, with one sample have just over 50% missing data for the yIndex probes causing a -Inf for one of the yMed values.

Is a possible solution to convert -Inf to NA in .getSex before calculating the median values?

CN[CN==-Inf] <- NA