statOmics / tradeSeq

TRAjectory-based Differential Expression analysis for SEQuencing data
Other
239 stars 28 forks source link

subscript out of bounds error #4

Closed lkmklsmn closed 4 years ago

lkmklsmn commented 5 years ago

Hi,

I am testing your tradseq R package. However, I am getting an error I cant understand. Here is my code:

load("tradseSeq_example.RData") lin <- getLineages(dm_tmp, clusterLabels = rep(0, nrow(dm_tmp))) crv <- getCurves(lin) plotGeneCount(rd = dm_tmp, curve = crv, counts = counts_tmp, gene = "Lcn2") gamList <- fitGAM(counts_tmp, pseudotime = slingPseudotime(crv), cellWeights = slingCurveWeights(crv))

Error in pseudotime[ii, which(as.logical(wSamp[ii, ]))] : subscript out of bounds

My example data can be downloaded from here: https://drive.google.com/file/d/1SZyeXXMQzXXANsRw2wiaw1l6cKh-MOyN/view?usp=sharing

I would greatly appreciate any help!

HectorRDB commented 5 years ago

Hi, Thanks for the warning. The issue is that you are feeding a single lineage to fitGam. In that case, the current implementation is expecting a vector of weights, here you are passing a matrix with only one column. It is not properly coded, and I appreciate you pointing that out.

While we come up with a fix, you can run this code instead, if you are indeed expecting to get only one lineage.

gamList <- fitGAM(counts_tmp, pseudotime = slingPseudotime(crv),
                  cellWeights = rep(1, nrow(dm_tmp)))

It runs in ~5mn for me. As a side note, your data analysis would benefit from filtering some genes. For example, some of them are zero in all samples. This will speed up your process, including running fitGam, and lower the number of tests you have to run.

lkmklsmn commented 5 years ago

Thx. Makes sense. Its running now.

I indeed only have one lineage. It wasnt clear to me how to run it that is why I did the clusterLabels = rep(0, nrow(dm_tmp)) in the getLineages() function.

The example data I sent you was just a small part of the entire matrix. Therefore the 'empty' genes. The real matrix is much bigger. From my understanding the gam fits are done one gene at a time, correct? It may benefit from multi-coring this process. Not sure what your thoughts about this are...

HectorRDB commented 5 years ago

Yes, implementing parallelization is at the top of our TODO list.

BrianLohman commented 4 years ago

Hi Hector,

I'm commenting here because I'm having a very similar issue.

The current development version (1.3.01) help docs for the figGAM function ask for the weights to be a 1 column matrix:

cellWeights A matrix of cell weights defining the probability that a cell belongs to a particular lineage. Each row represents a cell and each column represents a lineage. If only a single lineage, provide a matrix with one column containing all values of 1.

However, this produces the error:

w = matrix(nrow = 594, ncol = 1)
w[,1] = 1
> class(w)
[1] "matrix"
donor1_fibroblast_sce <- fitGAM(donor1_fibroblast_cds, verbose = TRUE, weights = w)
Error in weights[teller, ] : subscript out of bounds

If I change to a vector of 1s, as you suggest here, I get:

w = rep(1,594)
> class(w)
[1] "numeric"
donor1_fibroblast_sce <- fitGAM(donor1_fibroblast_cds, verbose = TRUE, weights = w)
Error in weights[teller, ] : incorrect number of dimensions

There are 594 cells in the lineage that I am interested in:

> length(donor1_fibroblast_cds@phenoData@data$Pseudotime)
[1] 594

Any help would be appreciated.

Cheers,

Brian

HectorRDB commented 4 years ago

Hi @BrianLohman There are two types of weights. cellWeights is a matrix for cells by lineages, it matches how cells are assigned to the lineages. weights are ZI weights (output from zinbwave or scVI for example) and should be a matrix of the same dimension as the count matrix. Based on what you describe, I think you are trying to pass the input of cellWeights using the weights argument.

BrianLohman commented 4 years ago

Hi @HectorRDB,

Ah, yes, you are correct. I was trying to set the cellWeights param.

One complication though: when fitGAM is passed a CellDataSet, cellWeights is not an available argument. Correct?

I am working a subset of cells that should all be one cell type (fibroblasts) so it made sense to me to try to fix the number of lineages at 1. Without fixing the number of lineages the resulting plotSmoothers shows 3 lineages, all on top of one another, like the top row of your cheat sheet table. It is very possible that this result is to be expected, but I wanted to explore fixing the number of lineage because of the restricted cell type input.

Thanks again for your continued assistance,

Brian

HectorRDB commented 4 years ago

Hi @BrianLohman Just to make sure I understand: you fit the gam models using all three lineages but you only want to plot one of those lineages?

BrianLohman commented 4 years ago

Hi @HectorRDB,

I did not fix the number of lineages when I ran fitGAM:

donor1_fibroblast_sce <- fitGAM(donor1_fibroblast_cds, verbose = TRUE)

So I assume that the function determined the number of lineages. However, given that I used only a handful of cells, all of the same known cell type, I did not expect more than one lineage to result. I was intrigued by this result (3 lineages when I expected 1, perhaps naively) so I wanted to explore different values of the cellWeights parameter.

Cheers,

Brian

HectorRDB commented 4 years ago

Hi @BrianLohman, Oh I see. fitGAM does not choose the number of lineages, it pull that information from the CellDataset object. So I would recommend either a) subsetting before running the lineage assignment function or b) manually choosing the CellWeights and pseudotime option. You might find the new extract_monocle_info function useful. Best

BrianLohman commented 4 years ago

Hi @HectorRDB,

I am getting an error trying to install the latest devel version from GitHub. It looks like it's related to the new extract_monocle_info function:

> devtools::install_github("statOmics/tradeSeq")
Downloading GitHub repo statOmics/tradeSeq@master
✓  checking for file ‘/tmp/Rtmp0q46Yo/remotesa06e7beb55a2/statOmics-tradeSeq-c7a6450/DESCRIPTION’ (635ms)
─  preparing ‘tradeSeq’:
✓  checking DESCRIPTION meta-information ...
   Warning: /tmp/RtmpPtsJPB/Rbuild82d81340275c/tradeSeq/man/extract_monocle_info.Rd:17: unknown macro '\lin'
─  checking for LF line-endings in source and make files and shell scripts
─  checking for empty or unneeded directories
─  looking to see if a ‘data/datalist’ file should be added
─  building ‘tradeSeq_1.3.03.tar.gz’
   Warning: invalid uid value replaced by that for user 'nobody'
   Warning: invalid gid value replaced by that for user 'nobody'

Installing package into ‘/home/u0806040/R/x86_64-pc-linux-gnu-library/4.0’
(as ‘lib’ is unspecified)
* installing *source* package ‘tradeSeq’ ...
** using staged installation
** R
** data
** byte-compile and prepare package for lazy loading
** help
Error : (converted from warning) /tmp/Rtmpfse8ao/R.INSTALL82ea29d3ac56/tradeSeq/man/extract_monocle_info.Rd:17: unknown macro '\lin'
ERROR: installing Rd objects failed for package ‘tradeSeq’
* removing ‘/home/u0806040/R/x86_64-pc-linux-gnu-library/4.0/tradeSeq’
* restoring previous ‘/home/u0806040/R/x86_64-pc-linux-gnu-library/4.0/tradeSeq’
Error: Failed to install 'tradeSeq' from GitHub:
  (converted from warning) installation of package ‘/tmp/Rtmp0q46Yo/filea06e6e0a27c6/tradeSeq_1.3.03.tar.gz’ had non-zero exit status
> sessionInfo()
R version 4.0.0 (2020-04-24)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS:   /usr/local/R/4.0.0/lib64/R/lib/libRblas.so
LAPACK: /usr/local/R/4.0.0/lib64/R/lib/libRlapack.so

Random number generation:
 RNG:     Mersenne-Twister 
 Normal:  Inversion 
 Sample:  Rounding 

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
 [1] splines   stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] devtools_2.3.0      usethis_1.6.1       tradeSeq_1.3.03     monocle_2.16.0      DDRTree_0.1.5       irlba_2.3.3        
 [7] VGAM_1.1-3          ggplot2_3.3.0       Biobase_2.48.0      BiocGenerics_0.34.0 Matrix_1.2-18       patchwork_1.0.0    
[13] Seurat_3.1.5        hciRdata_1.100      readr_1.3.1         hciR_1.3            dplyr_0.8.5        

loaded via a namespace (and not attached):
  [1] reticulate_1.15             tidyselect_1.1.0            RSQLite_2.2.0               AnnotationDbi_1.50.0       
  [5] htmlwidgets_1.5.1           grid_4.0.0                  combinat_0.0-8              docopt_0.6.1               
  [9] BiocParallel_1.22.0         Rtsne_0.15                  RNeXML_2.4.4                munsell_0.5.0              
 [13] codetools_0.2-16            ica_1.0-2                   future_1.17.0               withr_2.2.0                
 [17] colorspace_1.4-1            fastICA_1.2-2               knitr_1.28                  uuid_0.1-4                 
 [21] zinbwave_1.10.0             rstudioapi_0.11             SingleCellExperiment_1.10.1 ROCR_1.0-11                
 [25] listenv_0.8.0               NMF_0.22.0                  labeling_0.3                slam_0.1-47                
 [29] GenomeInfoDbData_1.2.3      farver_2.0.3                bit64_0.9-7                 pheatmap_1.0.12            
 [33] rhdf5_2.32.0                rprojroot_1.3-2             vctrs_0.3.0                 xfun_0.13                  
 [37] R6_2.4.1                    doParallel_1.0.15           GenomeInfoDb_1.24.0         rsvd_1.0.3                 
 [41] locfit_1.5-9.4              bitops_1.0-6                DelayedArray_0.14.0         assertthat_0.2.1           
 [45] scales_1.1.1                gtable_0.3.0                phylobase_0.8.10            npsurv_0.4-0.1             
 [49] globals_0.12.5              processx_3.4.2              rlang_0.4.6                 genefilter_1.70.0          
 [53] lazyeval_0.2.2              princurve_2.1.4             yaml_2.2.1                  reshape2_1.4.4             
 [57] backports_1.1.7             tools_4.0.0                 gridBase_0.4-7              ellipsis_0.3.1             
 [61] RColorBrewer_1.1-2          sessioninfo_1.1.1           ggridges_0.5.2              Rcpp_1.0.4.6               
 [65] plyr_1.8.6                  base64enc_0.1-3             progress_1.2.2              zlibbioc_1.34.0            
 [69] purrr_0.3.4                 RCurl_1.98-1.2              densityClust_0.3            ps_1.3.3                   
 [73] prettyunits_1.1.1           pbapply_1.4-2               viridis_0.5.1               cowplot_1.0.0              
 [77] S4Vectors_0.26.0            zoo_1.8-8                   SummarizedExperiment_1.18.1 ggrepel_0.8.2              
 [81] cluster_2.1.0               fs_1.4.1                    magrittr_1.5                data.table_1.12.8          
 [85] RSpectra_0.16-0             lmtest_0.9-37               RANN_2.6.1                  fitdistrplus_1.0-14        
 [89] matrixStats_0.56.0          pkgload_1.0.2               evaluate_0.14               hms_0.5.3                  
 [93] lsei_1.2-0.1                xtable_1.8-4                XML_3.99-0.3                sparsesvd_0.2              
 [97] IRanges_2.22.1              gridExtra_2.3               testthat_2.3.2              HSMMSingleCell_1.8.0       
[101] compiler_4.0.0              tibble_3.0.1                KernSmooth_2.23-17          crayon_1.3.4               
[105] htmltools_0.4.0             mgcv_1.8-31                 tidyr_1.0.3                 geneplotter_1.66.0         
[109] howmany_0.3-1               DBI_1.1.0                   slingshot_1.6.0             MASS_7.3-51.6              
[113] rappdirs_0.3.1              ade4_1.7-15                 cli_2.0.2                   igraph_1.2.5               
[117] GenomicRanges_1.40.0        pkgconfig_2.0.3             rncl_0.8.4                  registry_0.5-1             
[121] locfdr_1.1-8                plotly_4.9.2.1              xml2_1.3.2                  foreach_1.5.0              
[125] annotate_1.66.0             rngtools_1.5                pkgmaker_0.31.1             XVector_0.28.0             
[129] bibtex_0.4.2.2              callr_3.4.3                 stringr_1.4.0               digest_0.6.25              
[133] sctransform_0.2.1           RcppAnnoy_0.0.16            tsne_0.1-3                  softImpute_1.4             
[137] rmarkdown_2.1               leiden_0.3.3                uwot_0.1.8                  edgeR_3.30.0               
[141] curl_4.3                    kernlab_0.9-29              lifecycle_0.2.0             nlme_3.1-147               
[145] jsonlite_1.6.1              Rhdf5lib_1.10.0             clusterExperiment_2.8.0     desc_1.2.0                 
[149] viridisLite_0.3.0           limma_3.44.1                fansi_0.4.1                 pillar_1.4.4               
[153] lattice_0.20-41             pkgbuild_1.0.8              httr_1.4.1                  survival_3.1-12            
[157] remotes_2.1.1               glue_1.4.1                  qlcMatrix_0.9.7             FNN_1.1.3                  
[161] png_0.1-7                   iterators_1.0.12            bit_1.1-15.2                stringi_1.4.6              
[165] HDF5Array_1.16.0            blob_1.2.1                  DESeq2_1.28.1               memoise_1.1.0              
[169] future.apply_1.5.0          ape_5.3
BrianLohman commented 4 years ago

Thanks Hector, install looks good to go now.