pachterlab / sleuth

Differential analysis of RNA-Seq
http://pachterlab.github.io/sleuth
GNU General Public License v3.0
305 stars 95 forks source link

Report a better error when design matrix is singular #73

Open pimentel opened 8 years ago

pimentel commented 8 years ago

Error message is pretty hopeless if you don't know what's happening under the hood:

Error in solve.default(t(X) %*% X) : 
  Lapack routine dgesv: system is exactly singular: U[5,5] = 0

We need to detect these situations before we even get to the fitting procedure and give a useful message.

roryk commented 8 years ago

Hi Harold,

I fixed one case where this happens when a level exists in a factor but is not in the matrix here: https://github.com/pachterlab/sleuth/pull/71

apcamargo commented 7 years ago

I'm getting this error in the latest devel. Error in solve.default(t(X) %*% X) : Lapack routine dgesv: system is exactly singular: U[4,4] = 0

Kupac commented 6 years ago

I'm using the latest devel (725197c). I am testing on a reduced set of samples, so my design matrix is probably confounded. I still get this error, even though warrenmcg-s patch has been merged.

Here's the matrix (first and last column have the same grouping): stiffness knockdown trigger plate [1,] 1 2 3 2 [2,] 1 1 2 2 [3,] 2 1 3 1 [4,] 2 1 3 1 [5,] 2 1 1 1 [6,] 2 1 2 1

warrenmcg commented 6 years ago

Thanks for pointing this out! Your design matrix is indeed singular, but the condition I put in fails, as numeric errors result in a non-zero determinant and a finite modulus of the determinant. A singular matrix should yield a zero determinant and an undefined modulus of the determinant.

Did the error say something like this?

Error in solve.default(t(X) %*% (X)) : 
  system is computationally singular: reciprocal condition number = 1.18831e-19

If yes, we will be checking the reciprocal condition number to see if it's below the standard tolerance in the lm.fit function (~2.2e-16). If it is, then it will also throw an error.

Kupac commented 6 years ago

I dont't seem to be able to reproduce this error, so please ignore my previous comment. I am testing two versions, so maybe I got this error with the stable version after all. Sorry for the confusion.

warrenmcg commented 6 years ago

Hi @Kupac, to clarify: are you seeing the new error message, or is the matrix not causing a problem at all?

Kupac commented 6 years ago

With the devel version, I see a completely different error message, nothing to do with matrices. I think the design matrix one came up with the stable version, where it's not yet fixed.

  1. dec. 2. 15:21 ezt írta ("Warren McGee" notifications@github.com):

Hi @Kupac https://github.com/kupac, to clarify: are you seeing the new error message, or is the matrix not causing a problem at all?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pachterlab/sleuth/issues/73#issuecomment-348695138, or mute the thread https://github.com/notifications/unsubscribe-auth/AH1_OZER1fdwEzrHhVIQFPSEkJ0iB6Ihks5s8VzagaJpZM4I5FOV .

warrenmcg commented 6 years ago

What is the error you see @Kupac? It would be helpful to know whether we need to do more work.

Kupac commented 6 years ago

OK, I managed to find the logs of my last sleuth devel run. I got the following error: 2017-12-01T12:41:05.642840877Z reading in kallisto results 2017-12-01T12:41:05.642884419Z dropping unused factor levels 2017-12-01T12:41:07.887909194Z ...... 2017-12-01T12:41:08.789660485Z Error in [.data.table(counts_test, , total = .(total = sum(est_counts)), : 2017-12-01T12:41:08.789703502Z unused argument (total = .(total = sum(est_counts))) 2017-12-01T12:41:08.789709219Z Calls: sleuth_prep -> [ 2017-12-01T12:41:08.789714219Z Execution halted Hope it helps!

vthaker commented 6 years ago

Hi there,

Here is my design matrix:

 (Intercept) ns(day, df = 4)1 ns(day, df = 4)2 ns(day, df = 4)3 ns(day, df = 4)4
1           1        0.0000000        0.0000000        0.0000000       0.00000000
2           1        0.0000000        0.0000000        0.0000000       0.00000000
3           1        0.3703704       -0.1398224        0.4379858      -0.29199052
4           1        0.3703704       -0.1398224        0.4379858      -0.29199052
5           1        0.3703704        0.5244490        0.1488752      -0.04986741
6           1        0.3703704        0.5244490        0.1488752      -0.04986741
7           1        0.0000000       -0.1428571        0.4285714       0.71428571
8           1        0.0000000       -0.1428571        0.4285714       0.71428571

And this is the error:

so <- sleuth_fit(so)
Error in solve.default(t(X) %*% X) : 
  system is computationally singular: reciprocal condition number = 1.21925e-18

Any suggestions on how to fix it? thanks Vidhu

warrenmcg commented 6 years ago

Hi @vthaker,

I can confirm that I can reproduce the error here. How was this design matrix created? What was the original sample_to_covariates matrix?

The singular matrix error occurs when one column is a linear combination of other columns. I don't understand what the columns of your matrix mean, and how they might be related to each other, so I can't tell you exactly why this matrix fails. However, I do know that the second and last columns (ns#1 and ns#4) are highly correlated, and removing one or the other results in a matrix that works.

For more information about why matrices fail, I invite you to read this section of the DESeq2 manual: link. "Full rank" is synonymous with a design matrix that can be used for sleuth_fit.