xuranw / MuSiC

Multi-subject Single Cell Deconvolution
https://github.com/xuranw/MuSiC
GNU General Public License v3.0
231 stars 92 forks source link

r.squared.full have negative values #57

Open thongnt2 opened 4 years ago

thongnt2 commented 4 years ago

@xuranw Hi Xuran,

Really appreciate you developed this awesome R package. I have a question regarding R-squared. Could you please briefly describe how r-squared is calculated? Why R-squared is negative in some samples when running your example provided in the tutorial (see below)? Should it be between 0 and 1? Do you suggest to use R-squared to assess the deconvolution model (i.e. to know the proportion of variance explained )?

> Est.prop.GSE50244$r.squared.full
        Sub1         Sub2         Sub3         Sub4         Sub5         Sub6         Sub7         Sub8         Sub9        Sub10        Sub11        Sub12 
 0.382325634  0.778335199  0.180122054 -0.539965113  0.368613520 -1.623337789 -0.365178972 -2.300568947 -1.011227579  0.369487519  0.421768652  0.117407819 
       Sub13        Sub14        Sub15        Sub16        Sub17        Sub18        Sub19        Sub20        Sub21        Sub22        Sub23        Sub24 
-1.719121323  0.593733629  0.706020313 -1.448018054 -1.227255341 -1.376910441 -0.060431481  0.706763391 -0.275102935  0.369840806  0.730167090 -0.938863455 
       Sub25        Sub26        Sub27        Sub28        Sub29        Sub30        Sub31        Sub32        Sub33        Sub34        Sub35        Sub36 
 0.489312202  0.547965471 -0.137949563  0.685361737  0.506819718  0.735944188  0.199976462  0.494540730  0.004796409  0.654118706 -0.111014839  0.692557381 
       Sub37        Sub38        Sub39        Sub40        Sub41        Sub42        Sub43        Sub44        Sub45        Sub46        Sub47        Sub48 
 0.643831210 -0.502567839 -1.508444863  0.578305712  0.200220300  0.599730743  0.720443464  0.322581303  0.645781904  0.616435727 -0.300878139  0.073092162 
       Sub49        Sub50        Sub51        Sub52        Sub53        Sub54        Sub55        Sub56        Sub57        Sub58        Sub59        Sub60 
 0.678855900  0.548176944  0.006052819  0.257927251  0.516426416  0.083790370  0.277365923  0.489141490  0.267986228 -1.845740071  0.148962600  0.556741070 
       Sub61        Sub62        Sub63        Sub64        Sub65        Sub66        Sub67        Sub68        Sub69        Sub70        Sub71        Sub72 
 0.523308859  0.729804936  0.742723176  0.392313046  0.025595988 -2.942245341 -0.006516737 -0.250432421  0.089437682  0.630592038  0.735800007 -1.213851721 
       Sub73        Sub74        Sub75        Sub76        Sub77        Sub78        Sub79        Sub80        Sub81        Sub82        Sub83        Sub84 
 0.785149075  0.614060366 -0.361168940  0.488962634  0.325066920  0.620401089  0.724041051  0.497192984  0.555156143  0.183873944  0.402530541 -0.334978446 
       Sub85        Sub86        Sub87        Sub88        Sub89 
 0.676367261  0.636469101  0.081696095  0.689618802  0.016697994 

Thanks so much,

Tom

thongnt2 commented 4 years ago

After digging into the code, I found issue with the way that R.squared was calculated. It's in line 55 of file analysis.R in music.basic function: R.squared = 1 - var(Y - X%*%as.matrix(lm.D.weight$x))/var(Y) . I fixed this as below, and got more reasonable R.squared values.

      #R.squared = 1 - var(Y - X%*%as.matrix(lm.D.weight$x))/var(Y)
      RSS <- sum(resid(lm.D.weight)^2) # residual sum of squares
      TSS <- sum((Y.weight - mean(Y.weight))^2)
      R.squared <- 1 - RSS/TSS