Closed lkmklsmn closed 4 years ago
Hi, Thanks for the warning. The issue is that you are feeding a single lineage to fitGam. In that case, the current implementation is expecting a vector of weights, here you are passing a matrix with only one column. It is not properly coded, and I appreciate you pointing that out.
While we come up with a fix, you can run this code instead, if you are indeed expecting to get only one lineage.
gamList <- fitGAM(counts_tmp, pseudotime = slingPseudotime(crv),
cellWeights = rep(1, nrow(dm_tmp)))
It runs in ~5mn for me. As a side note, your data analysis would benefit from filtering some genes. For example, some of them are zero in all samples. This will speed up your process, including running fitGam, and lower the number of tests you have to run.
Thx. Makes sense. Its running now.
I indeed only have one lineage. It wasnt clear to me how to run it that is why I did the clusterLabels = rep(0, nrow(dm_tmp))
in the getLineages() function.
The example data I sent you was just a small part of the entire matrix. Therefore the 'empty' genes. The real matrix is much bigger. From my understanding the gam fits are done one gene at a time, correct? It may benefit from multi-coring this process. Not sure what your thoughts about this are...
Yes, implementing parallelization is at the top of our TODO list.
Hi Hector,
I'm commenting here because I'm having a very similar issue.
The current development version (1.3.01) help docs for the figGAM function ask for the weights to be a 1 column matrix:
cellWeights A matrix of cell weights defining the probability that a cell belongs to a particular lineage. Each row represents a cell and each column represents a lineage. If only a single lineage, provide a matrix with one column containing all values of 1.
However, this produces the error:
w = matrix(nrow = 594, ncol = 1)
w[,1] = 1
> class(w)
[1] "matrix"
donor1_fibroblast_sce <- fitGAM(donor1_fibroblast_cds, verbose = TRUE, weights = w)
Error in weights[teller, ] : subscript out of bounds
If I change to a vector of 1s, as you suggest here, I get:
w = rep(1,594)
> class(w)
[1] "numeric"
donor1_fibroblast_sce <- fitGAM(donor1_fibroblast_cds, verbose = TRUE, weights = w)
Error in weights[teller, ] : incorrect number of dimensions
There are 594 cells in the lineage that I am interested in:
> length(donor1_fibroblast_cds@phenoData@data$Pseudotime)
[1] 594
Any help would be appreciated.
Cheers,
Brian
Hi @BrianLohman
There are two types of weights. cellWeights
is a matrix for cells by lineages, it matches how cells are assigned to the lineages. weights
are ZI weights (output from zinbwave
or scVI
for example) and should be a matrix of the same dimension as the count matrix.
Based on what you describe, I think you are trying to pass the input of cellWeights
using the weights
argument.
Hi @HectorRDB,
Ah, yes, you are correct. I was trying to set the cellWeights
param.
One complication though: when fitGAM
is passed a CellDataSet
, cellWeights
is not an available argument. Correct?
I am working a subset of cells that should all be one cell type (fibroblasts) so it made sense to me to try to fix the number of lineages at 1. Without fixing the number of lineages the resulting plotSmoothers
shows 3 lineages, all on top of one another, like the top row of your cheat sheet table. It is very possible that this result is to be expected, but I wanted to explore fixing the number of lineage because of the restricted cell type input.
Thanks again for your continued assistance,
Brian
Hi @BrianLohman Just to make sure I understand: you fit the gam models using all three lineages but you only want to plot one of those lineages?
Hi @HectorRDB,
I did not fix the number of lineages when I ran fitGAM:
donor1_fibroblast_sce <- fitGAM(donor1_fibroblast_cds, verbose = TRUE)
So I assume that the function determined the number of lineages. However, given that I used only a handful of cells, all of the same known cell type, I did not expect more than one lineage to result. I was intrigued by this result (3 lineages when I expected 1, perhaps naively) so I wanted to explore different values of the cellWeights
parameter.
Cheers,
Brian
Hi @BrianLohman,
Oh I see. fitGAM
does not choose the number of lineages, it pull that information from the CellDataset
object. So I would recommend either a) subsetting before running the lineage assignment function or b) manually choosing the CellWeights
and pseudotime
option. You might find the new extract_monocle_info
function useful.
Best
Hi @HectorRDB,
I am getting an error trying to install the latest devel version from GitHub. It looks like it's related to the new extract_monocle_info
function:
> devtools::install_github("statOmics/tradeSeq")
Downloading GitHub repo statOmics/tradeSeq@master
✓ checking for file ‘/tmp/Rtmp0q46Yo/remotesa06e7beb55a2/statOmics-tradeSeq-c7a6450/DESCRIPTION’ (635ms)
─ preparing ‘tradeSeq’:
✓ checking DESCRIPTION meta-information ...
Warning: /tmp/RtmpPtsJPB/Rbuild82d81340275c/tradeSeq/man/extract_monocle_info.Rd:17: unknown macro '\lin'
─ checking for LF line-endings in source and make files and shell scripts
─ checking for empty or unneeded directories
─ looking to see if a ‘data/datalist’ file should be added
─ building ‘tradeSeq_1.3.03.tar.gz’
Warning: invalid uid value replaced by that for user 'nobody'
Warning: invalid gid value replaced by that for user 'nobody'
Installing package into ‘/home/u0806040/R/x86_64-pc-linux-gnu-library/4.0’
(as ‘lib’ is unspecified)
* installing *source* package ‘tradeSeq’ ...
** using staged installation
** R
** data
** byte-compile and prepare package for lazy loading
** help
Error : (converted from warning) /tmp/Rtmpfse8ao/R.INSTALL82ea29d3ac56/tradeSeq/man/extract_monocle_info.Rd:17: unknown macro '\lin'
ERROR: installing Rd objects failed for package ‘tradeSeq’
* removing ‘/home/u0806040/R/x86_64-pc-linux-gnu-library/4.0/tradeSeq’
* restoring previous ‘/home/u0806040/R/x86_64-pc-linux-gnu-library/4.0/tradeSeq’
Error: Failed to install 'tradeSeq' from GitHub:
(converted from warning) installation of package ‘/tmp/Rtmp0q46Yo/filea06e6e0a27c6/tradeSeq_1.3.03.tar.gz’ had non-zero exit status
> sessionInfo()
R version 4.0.0 (2020-04-24)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
Matrix products: default
BLAS: /usr/local/R/4.0.0/lib64/R/lib/libRblas.so
LAPACK: /usr/local/R/4.0.0/lib64/R/lib/libRlapack.so
Random number generation:
RNG: Mersenne-Twister
Normal: Inversion
Sample: Rounding
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] splines stats4 parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] devtools_2.3.0 usethis_1.6.1 tradeSeq_1.3.03 monocle_2.16.0 DDRTree_0.1.5 irlba_2.3.3
[7] VGAM_1.1-3 ggplot2_3.3.0 Biobase_2.48.0 BiocGenerics_0.34.0 Matrix_1.2-18 patchwork_1.0.0
[13] Seurat_3.1.5 hciRdata_1.100 readr_1.3.1 hciR_1.3 dplyr_0.8.5
loaded via a namespace (and not attached):
[1] reticulate_1.15 tidyselect_1.1.0 RSQLite_2.2.0 AnnotationDbi_1.50.0
[5] htmlwidgets_1.5.1 grid_4.0.0 combinat_0.0-8 docopt_0.6.1
[9] BiocParallel_1.22.0 Rtsne_0.15 RNeXML_2.4.4 munsell_0.5.0
[13] codetools_0.2-16 ica_1.0-2 future_1.17.0 withr_2.2.0
[17] colorspace_1.4-1 fastICA_1.2-2 knitr_1.28 uuid_0.1-4
[21] zinbwave_1.10.0 rstudioapi_0.11 SingleCellExperiment_1.10.1 ROCR_1.0-11
[25] listenv_0.8.0 NMF_0.22.0 labeling_0.3 slam_0.1-47
[29] GenomeInfoDbData_1.2.3 farver_2.0.3 bit64_0.9-7 pheatmap_1.0.12
[33] rhdf5_2.32.0 rprojroot_1.3-2 vctrs_0.3.0 xfun_0.13
[37] R6_2.4.1 doParallel_1.0.15 GenomeInfoDb_1.24.0 rsvd_1.0.3
[41] locfit_1.5-9.4 bitops_1.0-6 DelayedArray_0.14.0 assertthat_0.2.1
[45] scales_1.1.1 gtable_0.3.0 phylobase_0.8.10 npsurv_0.4-0.1
[49] globals_0.12.5 processx_3.4.2 rlang_0.4.6 genefilter_1.70.0
[53] lazyeval_0.2.2 princurve_2.1.4 yaml_2.2.1 reshape2_1.4.4
[57] backports_1.1.7 tools_4.0.0 gridBase_0.4-7 ellipsis_0.3.1
[61] RColorBrewer_1.1-2 sessioninfo_1.1.1 ggridges_0.5.2 Rcpp_1.0.4.6
[65] plyr_1.8.6 base64enc_0.1-3 progress_1.2.2 zlibbioc_1.34.0
[69] purrr_0.3.4 RCurl_1.98-1.2 densityClust_0.3 ps_1.3.3
[73] prettyunits_1.1.1 pbapply_1.4-2 viridis_0.5.1 cowplot_1.0.0
[77] S4Vectors_0.26.0 zoo_1.8-8 SummarizedExperiment_1.18.1 ggrepel_0.8.2
[81] cluster_2.1.0 fs_1.4.1 magrittr_1.5 data.table_1.12.8
[85] RSpectra_0.16-0 lmtest_0.9-37 RANN_2.6.1 fitdistrplus_1.0-14
[89] matrixStats_0.56.0 pkgload_1.0.2 evaluate_0.14 hms_0.5.3
[93] lsei_1.2-0.1 xtable_1.8-4 XML_3.99-0.3 sparsesvd_0.2
[97] IRanges_2.22.1 gridExtra_2.3 testthat_2.3.2 HSMMSingleCell_1.8.0
[101] compiler_4.0.0 tibble_3.0.1 KernSmooth_2.23-17 crayon_1.3.4
[105] htmltools_0.4.0 mgcv_1.8-31 tidyr_1.0.3 geneplotter_1.66.0
[109] howmany_0.3-1 DBI_1.1.0 slingshot_1.6.0 MASS_7.3-51.6
[113] rappdirs_0.3.1 ade4_1.7-15 cli_2.0.2 igraph_1.2.5
[117] GenomicRanges_1.40.0 pkgconfig_2.0.3 rncl_0.8.4 registry_0.5-1
[121] locfdr_1.1-8 plotly_4.9.2.1 xml2_1.3.2 foreach_1.5.0
[125] annotate_1.66.0 rngtools_1.5 pkgmaker_0.31.1 XVector_0.28.0
[129] bibtex_0.4.2.2 callr_3.4.3 stringr_1.4.0 digest_0.6.25
[133] sctransform_0.2.1 RcppAnnoy_0.0.16 tsne_0.1-3 softImpute_1.4
[137] rmarkdown_2.1 leiden_0.3.3 uwot_0.1.8 edgeR_3.30.0
[141] curl_4.3 kernlab_0.9-29 lifecycle_0.2.0 nlme_3.1-147
[145] jsonlite_1.6.1 Rhdf5lib_1.10.0 clusterExperiment_2.8.0 desc_1.2.0
[149] viridisLite_0.3.0 limma_3.44.1 fansi_0.4.1 pillar_1.4.4
[153] lattice_0.20-41 pkgbuild_1.0.8 httr_1.4.1 survival_3.1-12
[157] remotes_2.1.1 glue_1.4.1 qlcMatrix_0.9.7 FNN_1.1.3
[161] png_0.1-7 iterators_1.0.12 bit_1.1-15.2 stringi_1.4.6
[165] HDF5Array_1.16.0 blob_1.2.1 DESeq2_1.28.1 memoise_1.1.0
[169] future.apply_1.5.0 ape_5.3
Thanks Hector, install looks good to go now.
Hi,
I am testing your tradseq R package. However, I am getting an error I cant understand. Here is my code:
load("tradseSeq_example.RData")
lin <- getLineages(dm_tmp, clusterLabels = rep(0, nrow(dm_tmp)))
crv <- getCurves(lin)
plotGeneCount(rd = dm_tmp, curve = crv, counts = counts_tmp, gene = "Lcn2")
gamList <- fitGAM(counts_tmp, pseudotime = slingPseudotime(crv), cellWeights = slingCurveWeights(crv))
My example data can be downloaded from here: https://drive.google.com/file/d/1SZyeXXMQzXXANsRw2wiaw1l6cKh-MOyN/view?usp=sharing
I would greatly appreciate any help!