hansenlab / minfi

Devel repository for minfi
60 stars 70 forks source link

preprocessFunnorm: Error in kmeans (initial centers are not distinct) #239

Open mikldk opened 2 years ago

mikldk commented 2 years ago

I get an error in preprocessFunnorm(), see MWE below. The data is from a cell line without the Y chromosome, which may be what causes the error? (This is maybe related to #179.)

Any suggestions on how I can get preprocessFunnorm() to work? preprocessRaw() and preprocessIllumina() works fine. I am able to supply data (privately).

> library(minfi)
> packageVersion("minfi")
[1] ‘1.42.0’

> rgset <- minfi::read.metharray("data/20201002/203991460101/203991460101_R01C01")
Warnings:
1: I readChar(con, nchars = n) : truncating string with embedded nuls
2: I readChar(con, nchars = n) : truncating string with embedded nuls

> rgset
class: RGChannelSet 
dim: 1051815 1 
metadata(0):
assays(2): Green Red
rownames(1051815): 1600101 1600111 ... 99810990 99810992
rowData names(0):
colnames(1): 203991460101_R01C01
colData names(0):
Annotation
  array: IlluminaHumanMethylationEPIC
  annotation: ilm10b4.hg19

> mset <- preprocessFunnorm(rgset)
[preprocessFunnorm] Background and dye bias correction with noob
Loading required package: IlluminaHumanMethylationEPICmanifest
Loading required package: IlluminaHumanMethylationEPICanno.ilm10b4.hg19
[preprocessFunnorm] Mapping to genome
[preprocessFunnorm] Quantile extraction
Error in kmeans(dd, centers = c(min(dd), max(dd))) : 
  initial centers are not distinct
kasperdanielhansen commented 2 years ago

In functional normalization we treat the sex chromosomes different. To do so, we need to know the sex of the sample.

With defautl settings, the first step is therefore to predict the sex based on the data and this step fails. We know it will fail if you only have one sex. To handle this, it is also possible to supply the sex of the sample, which overrides the prediction step.

In your case, you should be able to handle this by supplying the sex of the sample as female. I would guess this should work. However, what would be more difficult is to handle the situation where you want to normalize this cell line together with samples with 2 X chromsomes and therefore an inactivated X. Is that something you need to do?

Best, Kasper

On Fri, Nov 11, 2022 at 3:16 PM Mikkel Meyer Andersen < @.***> wrote:

I get an error in preprocessFunnorm(), see MWE below. The data is from a cell line without the Y chromosome, which may be what causes the error? (This is maybe related to #179 https://github.com/hansenlab/minfi/issues/179.)

Any suggestions on how I can get preprocessFunnorm() to work? preprocessRaw() and preprocessIllumina() works fine. I am able to supply data (privately).

library(minfi)

packageVersion("minfi")

[1] ‘1.42.0’

rgset <- minfi::read.metharray("data/20201002/203991460101/203991460101_R01C01")

Warnings:

1: I readChar(con, nchars = n) : truncating string with embedded nuls

2: I readChar(con, nchars = n) : truncating string with embedded nuls

rgset

class: RGChannelSet

dim: 1051815 1

metadata(0):

assays(2): Green Red

rownames(1051815): 1600101 1600111 ... 99810990 99810992

rowData names(0):

colnames(1): 203991460101_R01C01

colData names(0):

Annotation

array: IlluminaHumanMethylationEPIC

annotation: ilm10b4.hg19

mset <- preprocessFunnorm(rgset)

[preprocessFunnorm] Background and dye bias correction with noob

Loading required package: IlluminaHumanMethylationEPICmanifest

Loading required package: IlluminaHumanMethylationEPICanno.ilm10b4.hg19

[preprocessFunnorm] Mapping to genome

[preprocessFunnorm] Quantile extraction

Error in kmeans(dd, centers = c(min(dd), max(dd))) :

initial centers are not distinct

— Reply to this email directly, view it on GitHub https://github.com/hansenlab/minfi/issues/239, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABF2DH4GZ2WMXLRJVHVC74LWH2SSNANCNFSM6AAAAAAR547KQ4 . You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- Best, Kasper

mikldk commented 2 years ago

@kasperdanielhansen Thanks for fast reply. If I instead run preprocessFunnorm(rgset, sex = "F") I get this error:

> mset <- preprocessFunnorm(rgset, sex = "F")
[preprocessFunnorm] Background and dye bias correction with noob
[preprocessFunnorm] Mapping to genome
[preprocessFunnorm] Quantile extraction
[preprocessFunnorm] Normalization
Error in oobG[2, ] : subscript out of bounds

> traceback()
3: .buildControlMatrix450k(extractedData)
2: .normalizeFunnorm450k(object = gmSet, extractedData = extractedData, 
       sex = sex, nPCs = nPCs, verbose = subverbose)
1: preprocessFunnorm(rgset, sex = "F")

No, I only need this one sample, not mixed with others.

I get similar if I try instead with quantile normalisation:

> mset <- preprocessQuantile(rgset)
[preprocessQuantile] Mapping to genome.
Error in kmeans(dd, centers = c(min(dd), max(dd))) : 
  initial centers are not distinct

> mset <- preprocessQuantile(rgset, sex = "F")
[preprocessQuantile] Mapping to genome.
[preprocessQuantile] Fixing outliers.
[preprocessQuantile] Quantile normalizing.
Error in if (ncol(mat) == 1) return(mat) : argument is of length zero
mikldk commented 2 years ago

@kasperdanielhansen Do you have any ideas of what can cause this? Again, I can send you the idat-files (via a private channel), if that can help?

kasperdanielhansen commented 2 years ago

In most preprocessing methods we need to do something special for the sex chromosome (for example due to X inactivation). To do this well, we need to know the sex of the samples. We have a standard way of estimating the sex of the samples using kmeans, and this step fails. It could fail for a number of reasons, the top contenders are (a) you only have 1 sex (the code assumes there are both males and females) (b) you have cancer samples with big CN changes on the sex chromosomes.

In case (a) or (b) you can override the prediction by directly supplying a vector of sex.

mikldk commented 2 years ago

In case (a) or (b) you can override the prediction by directly supplying a vector of sex.

@kasperdanielhansen I already tried with the sex = "F" argument (cf. above), and it still fails. Is there another way to supply the sex?

kasperdanielhansen commented 2 years ago

Ok, I am sorry, I see I basically wrote the same thing twice.

I also see you're trying to run 1 sample through functional normalization, right? That won't work. We essentially remove between-sample variation by regressing out certain confounders and that approach won't work for 1 sample processing.

Are you working in a prediction setting? If so, I would look into using Noob (in its single-sample mode, which is the default). Noob is "true" single sample normalization which means normalizing 1 sample is the same as normalizing many samples together.

mikldk commented 2 years ago

I also see you're trying to run 1 sample through functional normalization, right? That won't work. We essentially remove between-sample variation by regressing out certain confounders and that approach won't work for 1 sample processing.

Yes, I started with that. But I also tried preprocessQuantile - that should work with one sample, right?

Are you working in a prediction setting? If so, I would look into using Noob (in its single-sample mode, which is the default). Noob is "true" single sample normalization which means normalizing 1 sample is the same as normalizing many samples together.

Thanks, I will try that, too.