Open aedanr opened 4 years ago
This is very strange. The error is caused by locfdr (we used local FDR method to adjust for multiple testing). Apparently the only difference is the way of computing normalization factor. Using TMM caused this problem.
I debugged into the code. The normalization factors from TMM and DSS are highly correlated (>0.999). Also the test statistics from using either factors are also highly correlated:
cor(stat.TMM, stat.default) [1] 0.9999965
It's mysterious for me why this little difference will cause problem in locfdr. To be specific, using TMM normalization factor result in following error in locfdr:
3: In locfdr(stat.TMM, plot = 0) : CM estimation failed, middle of histogram non-normal
At this moment, I don't know how to fix it because I will have to look into locfdr. In the meantime, I suggest you use the "default", since the results are essentially the same.
Hao
Hi Hao,
Thanks for looking into this so quickly. I can use the built-in normalisation in DSS instead of supplying TMM factors for now, but I'm also trying to compare different normalisation methods, e.g. using calcNormFactors(rawdata, method="RLE")
, which also gives the same error but doesn't have an equivalent built in to DSS as far as I know, so it would be good to be able to supply these normalisation factors.
Thanks, Aedan
Aedan,
DSS does take other normalization factors, like what you did in newSeqCountSet. It usually works fine based on my previous tests. For whatever reason, it generates this mysterious error from this particular dataset. As I said, if you dive into the code you'll find that using TMM or DSS normalization gives very similar test statistics, cor= 0.9999965.
One possible solution on my side is to change the way to compute FDR. Currently DSS uses local FDR, which seems to be unstable sometimes. I can switch to tradition FDR if that fails. But this will take me a little time. I'm currently too busy with all other responsibilities including the grant deadline, phd admission, faculty search, etc. I probably can work on this after two weeks. In the meantime, maybe you can try another dataset?
Hao
No problem, I can work on other datasets for now and come back to this one later.
Hi Hao,
An update on this issue - it's now happening with one of my simulated datasets even when using the normalisation built in to DSS:
https://github.com/aedanr/PhD-Project-2.3/blob/master/raw.counts.DEDD50.3.rds https://github.com/aedanr/PhD-Project-2.3/blob/master/troubleshooting_DSS_issue.R
Yeah this comes from localFDR computation, which is something I cannot fix. As I said, I'll change the way to compute FDR. Will let you know once I have it. I'm currently very busy in many other responsibilities.
Hi,
I'm encountering an error when running
waldTest()
on some simulated RNA-seq data when I supply normalisation factors. The error doesn't occur when using normalisation factors computed using DSS even though the normalisation factors are very similar.The error occurs in
solve(G0)
, which is called vialoccov2()
, and seems to be caused by the matrix G0 being singular.The error can be reproduced using the data and code here: https://github.com/aedanr/PhD-Project-2.3/blob/master/raw.counts.DE50.1.rds https://github.com/aedanr/PhD-Project-2.3/blob/master/troubleshooting_DSS_issue.R:
Do you have any ideas on how to avoid this?
Thanks.