wwylab / DeMixT

GNU General Public License v3.0
32 stars 14 forks source link

Optimum Kernel function does not always work #5

Closed proteinosome closed 4 years ago

proteinosome commented 5 years ago

Hi authors, I am a researcher working with many lung cancer samples, and I recently saw your paper on iScience and I am interested to apply them to my bulk RNA-seq data. I have a set of normal lung tissues (n=39) and many more tumor samples. I attempt to run DeMixT using CPM normalized through the TMM method in limma (I "unlogged" logCPM output of limma by raising the matrix to the power of 2). However, I ran into this error when running DeMixT 0.2: Error in if (sum(obj == 0) > 1) { : missing value where TRUE/FALSE needed Upon investigation, this error happens because the output of the C function in OptimumKernel_C gave NaN in "rres[[22]]", subsequently leading to the sum of rres[[22]] to be NA. May I know if this issue is known and so why is it happening? I've attached an example of the data matrix (tumor and normal) that would throw this error.

normal_and_tumor_expression.zip

Note that I was able to run this successfully on another set of samples or on a subset of this cohort that gave an error.

Another question is that I thought of checking if the software is working fine by splitting my normal samples into half and attempt to deconvolute one half of the normals with the other half of the normals as reference. I expect that "pi" which is the reference component would be almost 1 for all the normal samples I ran the deconvolution on, since they are normals. But quite a number of the normals are giving extremely low normal component (~ 0) and some are in the middle. May I know if I should be worried about this, or is it just because the software isn't meant to be used that way.

Thanks you so much for this wonderful software!

wwylab commented 5 years ago

Thank you for your comments. Sorry we just saw this now. We have seen your issue before. If you still need help, can you please email me at wwang7@mdanderson.org? Thanks, Wenyi

AnnaPrivi commented 5 years ago

Hi Proteinosome,

Thank you for your comment. I have gotten the same issue on a bulk on RNAseq data. ` Step 2: Deconvolution of Expressions

Error in if (sum(obj == 0) > 1) { : missing value where TRUE/FALSE needed Called from: Optimum_KernelC(inputdata, groupid, nhavepi = 1, givenpi = givenpi, givenpiT = rep(0, ncol(data.Y)), niter = 1, ninteg = nbin, tol = 1e-05, nthread = nthread)`

I found the part of the code with the rres[[22]] in theoptimun_kernel function. How did you solve it? or Do you have any suggestion to solve it?

Thank you.

Anna

ShaolongCao commented 5 years ago

Hi Anna,

The "Error in if (sum(obj == 0) > 1) " is generally caused by NA values during likelihood calculation. It may happen when you have some negative values or very small values (<1) in the "inputdata" matrix. You should be able to solve this issue by applying a cut-off to filter out those genes with very small values.

Best, Shaolong

ShaolongCao commented 5 years ago

Hi proteinosome,

Here is my comment for your second question: "Another question is that I thought of checking if the software is working fine by splitting my normal samples into half and attempt to deconvolute one half of the normals with the other half of the normals as reference. I expect that "pi" which is the reference component would be almost 1 for all the normal samples I ran the deconvolution on, since they are normals. But quite a number of the normals are giving extremely low normal component (~ 0) and some are in the middle. May I know if I should be worried about this, or is it just because the software isn't meant to be used that way."

The DeMixT is not designed to work on samples with constant purity. Because the algorithm requires the variation of "T" component to infer their distribution. If all samples has zero "T" component, which violates the model assumption, the estimated "T" component proportion will be noise only.