zjdaye / MDSeq

MDSeq: Gene expression mean and variability analysis for RNA-seq counts
Other
6 stars 3 forks source link

About checked$outliers #1

Open Vanhesling opened 7 years ago

Vanhesling commented 7 years ago

Hi! I would like your help. I have read your paper titled "Gene expression variability and the analysis of large-scale RNA-seq studies with the MDSeq" which is a excellent work. So I plan to use the MDSeq to detect the gene expression variability. However, I get error messages when I try to detect the outlier. It seems that all the genes were detected as outlier.

Why is that? Could you give me any hints and how to fixed this issue?

> dat.checked <- remove.outlier(dat.normalized[1:1000, ], X=covars, U=covars,
+                               contrast = groups, mc.cores = 4)
4 threads are using!
Total time elapsed:3.9 seconds
> head(dat.checked$outliers)
                  status num.outliers
ENSG00000227232.4      2           NA
ENSG00000238009.2      2           NA
ENSG00000237683.5      2           NA
ENSG00000239906.1      2           NA
ENSG00000241860.2      2           NA
ENSG00000228463.4      2           NA
> table(dat.checked$outliers$status)

   2 
1000 
> 
> covars
          GENDER AGE RACE   BMI SMRIN SMTSISCH            X1            X2            X3            X4
Normal1        1  63    3 29.53   5.9     1017  0.1630506541 -1.095766e-01 -0.0056186635  0.0540478146
Normal2        1  62    3 30.78   8.1      133 -0.0688636685 -1.749981e-02  0.0345506232  0.0869748239
COPD1          2  66    3 25.82   7.6      840  0.1037386933 -9.394710e-02 -0.0855297718  0.0384818582
Normal3        1  58    3 29.83   6.2      825  0.1425086228  6.758439e-02  0.0474234960 -0.0076811603
COPD2          1  58    3 27.02   6.4     1218  0.0328722772  5.871772e-02  0.0569128696 -0.0170814274
COPD3          1  62    3 29.54   5.8      991  0.1076800161  7.706005e-02  0.0719706122 -0.0013471444
COPD4          2  66    3 20.12   7.6      670 -0.0160420656  9.598426e-02 -0.0436259659  0.1806682021
Normal4        2  66    3 29.05   6.9      773  0.0994010357  3.385225e-02 -0.0921304267  0.0165312554
Normal5        2  51    3 29.35   7.2       80 -0.0787204805 -6.110502e-02 -0.0235813528  0.0084930880

Finally, my used sessionInfo()

Best wishes Kevin

> sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=Chinese (Simplified)_China.936  LC_CTYPE=Chinese (Simplified)_China.936   
[3] LC_MONETARY=Chinese (Simplified)_China.936 LC_NUMERIC=C                              
[5] LC_TIME=Chinese (Simplified)_China.936    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] gtools_3.5.0               edgeR_3.18.1               limma_3.32.2              
 [4] MDSeq_1.0.5                BiocInstaller_1.26.0       devtools_1.13.2           
 [7] org.Hs.eg.db_3.4.0         AnnotationDbi_1.38.1       BiocParallel_1.10.1       
[10] WGCNA_1.51                 fastcluster_1.1.22         dynamicTreeCut_1.63-1     
[13] R.matlab_3.6.1             RColorBrewer_1.1-2         clusterProfiler_3.4.3     
[16] DOSE_3.2.0                 gplots_3.0.1               ggplot2_2.2.1             
[19] sva_3.24.0                 genefilter_1.58.1          mgcv_1.8-17               
[22] nlme_3.1-131               DESeq2_1.16.1              SummarizedExperiment_1.6.3
[25] DelayedArray_0.2.7         matrixStats_0.52.2         Biobase_2.36.2            
[28] GenomicRanges_1.28.3       GenomeInfoDb_1.12.2        IRanges_2.10.2            
[31] S4Vectors_0.14.3           BiocGenerics_0.22.0  
zjdaye commented 7 years ago

Hi Kevin,

Thanks for your interest!

It seems that an error is encountered in the process, so the detection stopped and no outlier detection was performed. (A non-zero status indicates an error, and the num.outliers is NA or not available.) Usually, this might be encountered on only a few unusually genes when the data cannot be properly estimated; for example, if all counts were zero. Please see Supplementary materials for more detailed discussions.

I suggest that you check your data first to see if there might be anything peculiar and compare with the vignette. You can also try outlier check on individual genes with check.outlier function (setting verbose=TRUE to get more info). Finally, you can try glm.MD function to see if the counts data were properly fitted.

Please let us know how it goes! If necessary for us to look in more details, please send us an example data and your codes/outputs. (Can send to my email address in the paper.)

Good luck!

Best, John.