hbc / bcbioRNASeq

R package for bcbio RNA-seq analysis.
https://bioinformatics.sph.harvard.edu/bcbioRNASeq
GNU Affero General Public License v3.0
58 stars 21 forks source link

Functional Analysis Template #192

Closed Adrian-Zet closed 1 year ago

Adrian-Zet commented 1 year ago

Dear bcbioRNASeq team,

I come back with an error gathered from trying to run the other template of the bcBioRNASeq package. The functional analysis one. This is the error:

Assert failure.
[3] isSubset(c("geneId", "entrezId"), colnames(rowData(dds))) is not TRUE.
Cause: `c("geneId", "entrezId")` has elements not in `colnames(rowData(dds))`:
entrezId

This is how I defined the object:

object <- bcbioRNASeq(
    uploadDir = file.path("final"),
    interestingGroups = c("group"),
    organism = "Homo sapiens",
    genomeBuild = "GRCh38"
)

The Differential Expression Template ran flawlessly and generated the output deseq.rds correctly.

I also tried creating the object using :

object <- bcbioRNASeq(
    uploadDir = file.path("final"),
    interestingGroups = c("group"),
    organism = "Homo sapiens",
    genomeBuild = "GRCh38",
    ensemblRelease = 100
)

But this approach gave me the following error:

→ Making <GenomicRanges> from <EnsDb>.
Organism: Homo sapiens
Genome build: GRCh38
Release: 100
Level: genes
→ Defining names by `geneId` column in `mcols()`.
── Metadata
! 239 unknown features detected: "ENSG00000116883", "ENSG00000130489", "ENSG00000154537", "ENSG00000163009", "ENSG00000170165".... Check that `ensemblRelease` or `gffFile` are correct. Define transgenes using `transgeneNames`.
Error: subscript is a NSBS object that is incompatible with the current
  subsetting operation

I will also add project YAML (changed to .txt for uploading here) for helping out with the debugging. I actually don't know how to find out what version of ensmblRelease was used by bcbio.

project-summary (2).txt

mjsteinbaugh commented 1 year ago

@Adrian-Zet You're seeing that error with ensemblRelease = 100 because of a genome build mismatch. Are you using the built-in GRCh38 / hg38 reference? I believe that is built against Ensembl 94 currently.

Here's where the transcriptome is defined: https://github.com/chapmanb/cloudbiolinux/blob/master/ggd-recipes/hg38/transcripts.yaml

Adrian-Zet commented 1 year ago

Hello, I managed to solve it yesterday by running the genes missing from the given ensemblrelease using http://www.ensembl.org/Homo_sapiens/Tools/IDMapper/Results?tl=TqzBcwAxZ9mZ1BWa-8812266

It seems the correct database was version 95.

Should I update the human genome in bcbio, currently I'm using the built-in hg38/GRCh38. Am I supposed to use a different one?

Also, in the folder results/date-time/kegg-plots I have no figures. I attached the resulting HTML file for easily taking a look at the code and my current results.

functional.zip

Thank you, Adrian.

mjsteinbaugh commented 1 year ago

Hi @Adrian-Zet , no need to update the bcbio hg38 genome -- it's just important that you match the annotation versions in downstream analysis. I'll see if I can make this a little more seamless in bcbioRNASeq in a future update.

I'll take a look at the functional analysis output and get back to you. The functional analysis template runs pathview and you could be running into an issue with that package.

jxshi commented 1 year ago

Hi @mjsteinbaugh ,

I have encountered the same issue. The ensembl release should be 94, in the ~/local/share/bcbio/genomes/Hsapiens/hg38/rnaseq folder, cat version.txt returned the following information:

Created from: /n/data1/cores/bcbio/rory_kirchner/genome_setups/hg38-build94/rnaseq/hg38.gtf
library(bcbioRNASeq);
bcb <- bcbioRNASeq(uploadDir="/data1/m6a/pmid33961823/gse167075/results/rnaseq/gse167075_rnaseq/final",
                   interestingGroups=c("phenotype"),level="gene",
                   organism="Homo sapiens",
                   genomeBuild = "GRCh38",
                   ensemblRelease=94L);
flat <- AcidBase::coerceToList(bcb);
saveData(bcb, flat, dir="data")

But when doing functional annotation for the differentially expressed genes, the following error message popped up. I have tried to change ensemblRelease to 90, 92, 94, 95, 96, 97, 98, 99, 100, 101, and neither one worked. Can you check for me, please? Thank you in advance!

R version 4.2.3 (2023-03-15) -- "Shortstop Beagle"
Copyright (C) 2023 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> rmarkdown::render('METTL3_KD_vs_Control_FuncAnno_BP.Rmd')

processing file: METTL3_KD_vs_Control_FuncAnno_BP.Rmd
  |....                                         |   9% (header)

processing file: ./_header.Rmd

  |........                                     |  19% (deseqdataset)          Quitting from lines 119-126 (METTL3_KD_vs_Control_FuncAnno_BP.Rmd)
Error in `stop()`:
! Assert failure.
[3] isSubset(c("geneId", "entrezId"), colnames(rowData(dds))) is not TRUE.
Cause: `c("geneId", "entrezId")` has elements not in `colnames(rowData(dds))`: entrezId
Backtrace:
    x
 1. \-goalie::assert(...)
 2.   \-AcidCLI (local) stop(...)
 3.     +-base::do.call(what = rlang::abort, args = args)
 4.     \-rlang (local) `<fn>`(message = "Assert failure.\n[3] isSubset(c(\"geneId\", \"entrezId\"), colnames(rowData(dds))) is not TRUE.\nCause: `c(\"geneId\", \"entrezId\")` has elements not in `colnames(rowData(dds))`: entrezId")

Execution halted
jxshi commented 1 year ago

I also tried mouse data, and it popped up the same error message.

library(bcbioRNASeq);
bcb <- bcbioRNASeq(uploadDir="/data1/rnaseq/results/MD5VDAD/final",
    interestingGroups=c("phenotype"),
    level="gene",
    organism="Mus musculus",
    genomeBuild="GRCm38",
    ensemblRelease=94);
flat <- AcidBase::coerceToList(bcb);
saveData(bcb, flat, dir="data")

The error message reads:

R version 4.2.3 (2023-03-15) -- "Shortstop Beagle"
Copyright (C) 2023 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> rmarkdown::render('C_vs_M_FuncAnno_BP.Rmd')

processing file: C_vs_M_FuncAnno_BP.Rmd
  |....                                         |   9% (header)

processing file: ./_header.Rmd

  |........                                     |  19% (deseqdataset)          Quitting from lines 119-126 (C_vs_M_FuncAnno_BP.Rmd)
Error in `stop()`:
! Assert failure.
[3] isSubset(c("geneId", "entrezId"), colnames(rowData(dds))) is not TRUE.
Cause: `c("geneId", "entrezId")` has elements not in `colnames(rowData(dds))`: entrezId
Backtrace:
    x
 1. \-goalie::assert(...)
 2.   \-AcidCLI (local) stop(...)
 3.     +-base::do.call(what = rlang::abort, args = args)
 4.     \-rlang (local) `<fn>`(message = "Assert failure.\n[3] isSubset(c(\"geneId\", \"entrezId\"), colnames(rowData(dds))) is not TRUE.\nCause: `c(\"geneId\", \"entrezId\")` has elements not in `colnames(rowData(dds))`: entrezId")

And here is the output of sessionInfo():

R version 4.2.3 (2023-03-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.10

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] AcidGSEA_0.8.8              DESeqAnalysis_0.6.8         basejump_0.16.5             goalie_0.6.9               
 [5] pathview_1.38.0             enrichplot_1.18.4           clusterProfiler_4.7.1.003   DOSE_3.24.2                
 [9] DESeq2_1.38.3               withr_2.5.0                 stringi_1.7.12              ggnewscale_0.4.8           
[13] R.utils_2.12.2              R.oo_1.25.0                 R.methodsS3_1.8.2           ensembldb_2.22.0           
[17] AnnotationFilter_1.22.0     GenomicFeatures_1.50.4      AnnotationDbi_1.60.2        bcbioRNASeq_0.5.3          
[21] SummarizedExperiment_1.28.0 Biobase_2.58.0              GenomicRanges_1.50.2        GenomeInfoDb_1.34.9        
[25] IRanges_2.32.0              S4Vectors_0.36.2            BiocGenerics_0.44.0         MatrixGenerics_1.10.0      
[29] matrixStats_0.63.0         

loaded via a namespace (and not attached):
  [1] utf8_1.2.3                    tidyselect_1.2.0              RSQLite_2.3.1                 htmlwidgets_1.6.2            
  [5] grid_4.2.3                    BiocParallel_1.32.6           scatterpie_0.1.9              AcidPlots_0.5.5              
  [9] munsell_0.5.0                 codetools_0.2-19              DT_0.27                       AcidGenerics_0.6.7           
 [13] colorspace_2.1-0              GOSemSim_2.24.0               pipette_0.10.9                filelock_1.0.2               
 [17] knitr_1.42                    rstudioapi_0.14               SingleCellExperiment_1.20.1   KEGGgraph_1.58.3             
 [21] tximport_1.26.1               GenomeInfoDbData_1.2.9        polyclip_1.10-4               bit64_4.0.5                  
 [25] farver_2.1.1                  downloader_0.4                treeio_1.22.0                 vctrs_0.6.2                  
 [29] generics_0.1.3                gson_0.1.0                    xfun_0.39                     BiocFileCache_2.6.1          
 [33] R6_2.5.1                      graphlayouts_0.8.4            locfit_1.5-9.7                bitops_1.0-7                 
 [37] cachem_1.0.7                  fgsea_1.24.0                  gridGraphics_0.5-1            DelayedArray_0.24.0          
 [41] promises_1.2.0.1              BiocIO_1.8.0                  scales_1.2.1                  vroom_1.6.3                  
 [45] ggraph_2.1.0                  gtable_0.3.3                  tidygraph_1.2.3               AcidExperiment_0.4.7         
 [49] rlang_1.1.1                   AcidMarkdown_0.2.5            splines_4.2.3                 rtracklayer_1.58.0           
 [53] lazyeval_0.2.2                BiocManager_1.30.20           yaml_2.3.7                    reshape2_1.4.4               
 [57] httpuv_1.6.9                  qvalue_2.30.0                 syntactic_0.6.6               tools_4.2.3                  
 [61] ggplotify_0.1.0               ggplot2_3.4.2                 ellipsis_0.3.2                RColorBrewer_1.1-3           
 [65] MultiAssayExperiment_1.24.0   Rcpp_1.0.10                   plyr_1.8.8                    progress_1.2.2               
 [69] zlibbioc_1.44.0               purrr_1.0.1                   RCurl_1.98-1.12               prettyunits_1.1.1            
 [73] viridis_0.6.2                 cowplot_1.1.1                 ggrepel_0.9.3                 magrittr_2.0.3               
 [77] data.table_1.14.8             ProtGenerics_1.30.0           AcidCLI_0.2.7                 hms_1.1.3                    
 [81] patchwork_1.1.2               mime_0.12                     AcidBase_0.6.15               evaluate_0.20                
 [85] xtable_1.8-4                  HDO.db_0.99.1                 XML_3.99-0.14                 gridExtra_2.3                
 [89] compiler_4.2.3                biomaRt_2.54.1                tibble_3.2.1                  shadowtext_0.1.2             
 [93] crayon_1.5.2                  htmltools_0.5.5               ggfun_0.0.9                   later_1.3.0                  
 [97] tzdb_0.3.0                    tidyr_1.3.0                   geneplotter_1.76.0            aplot_0.1.10                 
[101] DBI_1.1.3                     tweenr_2.0.2                  dbplyr_2.3.2                  MASS_7.3-59                  
[105] rappdirs_0.3.3                Matrix_1.5-4                  readr_2.1.4                   cli_3.6.1                    
[109] parallel_4.2.3                AcidSingleCell_0.3.5          igraph_1.4.2                  pkgconfig_2.0.3              
[113] GenomicAlignments_1.34.1      xml2_1.3.4                    ggtree_3.6.2                  annotate_1.76.0              
[117] AcidGenomes_0.5.0             XVector_0.38.0                yulab.utils_0.0.6             stringr_1.5.0                
[121] digest_0.6.31                 graph_1.76.0                  Biostrings_2.66.0             rmarkdown_2.21               
[125] AcidPlyr_0.3.10               fastmatch_1.1-3               tidytree_0.4.2                edgeR_3.40.2                 
[129] restfulr_0.0.15               curl_5.0.0                    shiny_1.7.4                   Rsamtools_2.14.0             
[133] rjson_0.2.21                  lifecycle_1.0.3               nlme_3.1-162                  jsonlite_1.8.4               
[137] viridisLite_0.4.1             limma_3.54.2                  fansi_1.0.4                   pillar_1.9.0                 
[141] lattice_0.21-8                KEGGREST_1.38.0               fastmap_1.1.1                 httr_1.4.5                   
[145] GO.db_3.16.0                  interactiveDisplayBase_1.36.0 glue_1.6.2                    png_0.1-8                    
[149] Rgraphviz_2.42.0              BiocVersion_3.16.0            bit_4.0.5                     ggforce_0.4.1                
[153] bcbioBase_0.8.1               blob_1.2.4                    org.Hs.eg.db_3.16.0           AnnotationHub_3.6.0          
[157] memoise_2.0.1                 dplyr_1.1.2                   ape_5.7-1      
mjsteinbaugh commented 1 year ago

Hi I'm working on an update to bcbioRNASeq that will fix this issue. Sorry for the breaking change!

mjsteinbaugh commented 1 year ago

@Adrian-Zet @jxshi Can you let me know if this draft updated functional analysis template works for you? I'm working on pushing a bcbioRNASeq package update soon.

https://github.com/acidgenomics/r-bcbiornaseq/blob/develop/inst/rmarkdown/templates/03-functional-analysis/skeleton/skeleton.Rmd

Best, Mike

mjsteinbaugh commented 1 year ago

OK this should be resolved in the bcbioRNASeq 0.5.4 update, which is processing. If you continue to have issues with the functional analysis template, please reopen!

Best, Mike

jxshi commented 1 year ago

I use it on both human and mouse RNA-seq data, the new template works perfectly. Thank you so much! Have a nice day!

Best, Jianxiang

mjsteinbaugh commented 1 year ago

Great to hear!! Thanks for the follow up.