hbc / bcbioRNASeq

R package for bcbio RNA-seq analysis.
https://bioinformatics.sph.harvard.edu/bcbioRNASeq
GNU Affero General Public License v3.0
58 stars 21 forks source link

Error in loading data from ```final``` directory. #170

Closed jxshi closed 3 years ago

jxshi commented 3 years ago

Hi,

I have installed the latest bcbioRNASeq in another workstation to check if it can work. However, an error message popped up in the load data from the bcbio-nextgen final directory step. Here is the error message:

→ Getting `EnsDb` from AnnotationHub 2.22.0 (2020-10-27).
ℹ AH57757: Ensembl 90 EnsDb for Homo Sapiens.
→ Making `GRanges` from `EnsDb`.
    Organism: Homo sapiens
    Genome build: GRCh38
    Release: 90
    Level: genes
→ Defining names by `geneId` column in `mcols`.
🧪 ==> Metadata
Error in eval(expr, envir = list(`?` = function(...) stop()), enclos = envir) : 
  object '.val' not found
Calls: bcbioRNASeq ... glue_cmd -> glue -> glue_data -> <Anonymous> -> .transformer
Execution halted

R sessionInfo() returned the following message:

> library(bcbioRNASeq);sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS:   /public/home/zzumpc08/miniconda3/envs/r-base/lib/libblas.so.3.8.0
LAPACK: /public/home/zzumpc08/miniconda3/envs/r-base/lib/liblapack.so.3.8.0

locale:
 [1] LC_CTYPE=en_US.utf8       LC_NUMERIC=C             
 [3] LC_TIME=en_US.utf8        LC_COLLATE=en_US.utf8    
 [5] LC_MONETARY=en_US.utf8    LC_MESSAGES=en_US.utf8   
 [7] LC_PAPER=en_US.utf8       LC_NAME=C                
 [9] LC_ADDRESS=C              LC_TELEPHONE=C           
[11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C      

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] bcbioRNASeq_0.3.40 basejump_0.14.14  

loaded via a namespace (and not attached):
  [1] colorspace_2.0-0              ellipsis_0.3.1               
  [3] XVector_0.30.0                GenomicRanges_1.42.0         
  [5] AcidPlots_0.3.3               rstudioapi_0.13              
  [7] bit64_4.0.5                   interactiveDisplayBase_1.28.0
  [9] AnnotationDbi_1.52.0          fansi_0.4.2                  
 [11] xml2_1.3.2                    splines_4.0.3                
 [13] tximport_1.18.0               cachem_1.0.4                 
 [15] goalie_0.5.0                  geneplotter_1.68.0           
 [17] knitr_1.31                    AcidExperiment_0.1.8         
 [19] Rsamtools_2.6.0               annotate_1.68.0              
 [21] dbplyr_2.1.0                  shiny_1.6.0                  
 [23] BiocManager_1.30.10           compiler_4.0.3               
 [25] httr_1.4.2                    assertthat_0.2.1             
 [27] Matrix_1.3-2                  fastmap_1.1.0                
 [29] lazyeval_0.2.2                limma_3.46.0                 
 [31] cli_2.3.1                     later_1.1.0.1                
 [33] htmltools_0.5.1.1             prettyunits_1.1.1            
 [35] tools_4.0.3                   gtable_0.3.0                 
 [37] glue_1.4.2                    GenomeInfoDbData_1.2.4       
 [39] dplyr_1.0.4                   rappdirs_0.3.3               
 [41] Rcpp_1.0.6                    Biobase_2.50.0               
 [43] vctrs_0.3.6                   AcidGenomes_0.2.6            
 [45] Biostrings_2.58.0             AcidGenerics_0.5.16          
 [47] rtracklayer_1.50.0            xfun_0.21                    
 [49] stringr_1.4.0                 syntactic_0.4.4              
 [51] mime_0.10                     lifecycle_1.0.0              
 [53] ensembldb_2.14.0              XML_3.99-0.5                 
 [55] AcidCLI_0.0.6                 edgeR_3.32.1                 
 [57] AnnotationHub_2.22.0          zlibbioc_1.36.0              
 [59] scales_1.1.1                  vroom_1.4.0                  
 [61] hms_1.0.0                     promises_1.2.0.1             
 [63] MatrixGenerics_1.2.0          ProtGenerics_1.22.0          
 [65] parallel_4.0.3                SummarizedExperiment_1.20.0  
 [67] AnnotationFilter_1.14.0       RColorBrewer_1.1-2           
 [69] SingleCellExperiment_1.12.0   yaml_2.2.1                   
 [71] curl_4.3                      AcidSingleCell_0.1.5         
 [73] memoise_2.0.0                 ggplot2_3.3.3                
 [75] biomaRt_2.46.3                stringi_1.5.3                
 [77] RSQLite_2.2.3                 BiocVersion_3.12.0           
 [79] genefilter_1.72.1             S4Vectors_0.28.1             
 [81] GenomicFeatures_1.42.1        BiocGenerics_0.36.0          
 [83] BiocParallel_1.24.1           GenomeInfoDb_1.26.2          
 [85] rlang_0.4.10                  pkgconfig_2.0.3              
 [87] matrixStats_0.58.0            bitops_1.0-6                 
 [89] lattice_0.20-41               purrr_0.3.4                  
 [91] GenomicAlignments_1.26.0      cowplot_1.1.1                
 [93] bit_4.0.4                     tidyselect_1.1.0             
 [95] magrittr_2.0.1                AcidPlyr_0.1.14              
 [97] DESeq2_1.30.1                 R6_2.5.0                     
 [99] IRanges_2.24.1                generics_0.1.0               
[101] DelayedArray_0.16.0           DBI_1.1.1                    
[103] pillar_1.5.0                  withr_2.4.1                  
[105] survival_3.2-7                RCurl_1.98-1.2               
[107] bcbioBase_0.6.19              tibble_3.0.6                 
[109] pipette_0.5.12                crayon_1.4.1                 
[111] utf8_1.1.4                    BiocFileCache_1.14.0         
[113] progress_1.2.2                locfit_1.5-9.4               
[115] grid_4.0.3                    data.table_1.14.0            
[117] blob_1.2.1                    digest_0.6.27                
[119] xtable_1.8-4                  AcidBase_0.3.10              
[121] httpuv_1.5.5                  openssl_1.4.3                
[123] stats4_4.0.3                  munsell_0.5.0                
[125] AcidMarkdown_0.1.0            sessioninfo_1.1.1            
[127] askpass_1.1

Can you check for me please? Thank you in advance!

Best, Jianxiang

mjsteinbaugh commented 3 years ago

@jxshi thanks I’ll see if I can reproduce and will work on a fix

mjsteinbaugh commented 3 years ago

Can you try repeating with these enabled in the R session:

options(
    "error" = quote(rlang::entrace()),
    "rlang_backtrace_on_error" = "full",
    "rstudio.errors.suppressed" = FALSE
)

This is really helpful for debugging. Thanks!

jxshi commented 3 years ago

Can you try repeating with these enabled in the R session:

options(
    "error" = quote(rlang::entrace()),
    "rlang_backtrace_on_error" = "full",
    "rstudio.errors.suppressed" = FALSE
)

This is really helpful for debugging. Thanks!

Hi @mjsteinbaugh ,

Here is the output of the error message:

🧪 ==> Feature metadata
    bcbio GTF file: /home/dell/local/share/bcbio/genomes/Hsapiens/hg38/rnaseq/ref-transcripts.gtf
→ Making `GRanges` from GFF file (ref-transcripts.gtf).
→ Getting GFF metadata for ref-transcripts.gtf.
→ Importing ref-transcripts.gtf at /home/dell/local/share/bcbio/genomes/Hsapiens/hg38/rnaseq using rtracklayer::`import()`.
Error in .makeGRangesFromRtracklayer(file = file, level = level, ignoreVersion = ignoreVersion,  :
  Assert failure.
[3] isString(meta[["provider"]]) is not TRUE.
Cause: 'x' does not have a length of 1.
Backtrace:
    █
 1. └─bcbioRNASeq::bcbioRNASeq(...)
 2.   └─AcidGenomes::makeGRangesFromGFF(...)
 3.     └─AcidGenomes:::.makeGRangesFromRtracklayer(...)
 4.       └─goalie::assert(...)
mjsteinbaugh commented 3 years ago

Perfect I’ll fix that today. The issue is that I need to tweak the handling of the bcbio GTF file.

jxshi commented 3 years ago

Great! Thanks for your timely reply. I will try later to check if it can work out.

Best, Jianxiang

mjsteinbaugh commented 3 years ago

If you update the AcidGenomes dependency package to v0.2.7, does that fix your issue? https://github.com/acidgenomics/r-acidgenomes

jxshi commented 3 years ago

If you update the AcidGenomes dependency package to v0.2.7, does that fix your issue? https://github.com/acidgenomics/r-acidgenomes

I have completed the update and the load data step went through. Then it got stuck in the differential expression analysis step in line 305-323.

Quitting from lines 305-323 (treatment1_vs_control_DiffExpr.rmd)
Error in .local(object, ...) :
  unused argument (DESeqDataSet = new("DESeqDataSet", design = ~phenotype, dispersionFunction = function (q)
coefs[1] + coefs[2]/q, rowRanges = new("GRanges", seqnames = new("Rle", values = c(23, 20, 1, 6, 1, 23, 6, 3, 7, 12, 7, 11, 4, 23, 4, 7, 17, 7, 12, 23, 2, 7, 16, 2, 3, 8, 7, 17, 3, 1, 4, 12, 1, 3, 17, 12, 7, 19, 16, 7, 6, 3, 7, 23, 7, 17, 7, 23, 17, 16, 19, 7, 23, 4, 7, 17, 7, 17, 12, 16, 19, 9, 17, 7, 23, 16, 17, 7, 2, 7, 16, 6, 11, 13, 16, 17, 23, 17, 7, 16, 19, 17, 11, 16, 11, 17, 7, 16, 17, 16, 7, 14, 7, 11, 1, 7, 2, 11, 7, 19, 7, 19, 17, 7, 19, 7, 17, 7, 23, 17, 12, 5, 2, 16, 19, 4, 19, 17,
19, 17, 1, 23, 11, 16, 3, 16, 6, 1, 6, 23, 1, 6, 23, 6, 23, 1, 6, 3, 7, 17, 3, 7, 12, 3, 19, 12, 19, 8, 16, 22, 17, 8, 2, 3, 8, 1, 7, 6, 23, 1, 6, 1, 14, 6, 7, 6, 1, 12, 17, 3, 7, 12, 3, 12, 19, 3, 19, 12, 23, 9, 16, 12, 23, 6, 1, 6, 1, 17, 5, 12, 14, 19, 17, 3, 23, 19, 17, 7, 17, 19, 11, 3, 11, 19, 7, 19, 9, 12, 19, 2, 19, 16, 23, 17, 19, 3, 23, 3, 8, 12, 6, 10, 24, 12, 14, 2,
Calls: render ... eval -> eval -> mapply -> <Anonymous> -> <Anonymous>

And I have also tried to enable the following options in the R session, the same error message popped up.

options(
    "error" = quote(rlang::entrace()),
    "rlang_backtrace_on_error" = "full",
    "rstudio.errors.suppressed" = FALSE
)
mjsteinbaugh commented 3 years ago

OK great, I’ll see if I can fix that one today.

mjsteinbaugh commented 3 years ago

@jxshi Does this draft version of a revised differential expression template work for you? https://github.com/acidgenomics/r-bcbiornaseq/blob/master/inst/rmarkdown/templates/02-differential-expression/skeleton/skeleton.Rmd

jxshi commented 3 years ago

Hi @mjsteinbaugh ,

Thank you for your solution. I have successfully run the differential expression step with one contrast group. However, when put multiple contrasts in the ## Multiple contrasts are supported here section.

  ## Multiple contrasts are supported here.
  contrasts: !r list(
    c(
      factor = "phenotype",
      numerator = "Treatment1",
      denominator = "Control"
    ),
    c(
      factor = "phenotype",
      numerator = "Treatment2",
      denominator = "Control"
    )
  )

The following error popped up.

Error in yaml::yaml.load(..., eval.expr = TRUE) : 
  Scanner error: while scanning a simple key at line 28, column 3 could not find expected ':' at line 29, column 3
Backtrace:
    █
 1. └─(function (input, output_yaml = NULL) ...
 2.   └─rmarkdown:::output_format_from_yaml_front_matter(...)
 3.     └─rmarkdown:::parse_yaml_front_matter(input_lines)
 4.       └─rmarkdown:::yaml_load(front_matter)
 5.         └─yaml::yaml.load(..., eval.expr = TRUE)
mjsteinbaugh commented 3 years ago

OK thanks I’ll fix that

jxshi commented 3 years ago

Hi @mjsteinbaugh ,

Thank you for your solution. I have successfully run the differential expression step with one contrast group. However, when put multiple contrasts in the ## Multiple contrasts are supported here section.

  ## Multiple contrasts are supported here.
  contrasts: !r list(
    c(
      factor = "phenotype",
      numerator = "Treatment1",
      denominator = "Control"
    ),
    c(
      factor = "phenotype",
      numerator = "Treatment2",
      denominator = "Control"
    )
  )

The following error popped up.

Error in yaml::yaml.load(..., eval.expr = TRUE) : 
  Scanner error: while scanning a simple key at line 28, column 3 could not find expected ':' at line 29, column 3
Backtrace:
    █
 1. └─(function (input, output_yaml = NULL) ...
 2.   └─rmarkdown:::output_format_from_yaml_front_matter(...)
 3.     └─rmarkdown:::parse_yaml_front_matter(input_lines)
 4.       └─rmarkdown:::yaml_load(front_matter)
 5.         └─yaml::yaml.load(..., eval.expr = TRUE)

I solved the problem by editing the Multiple contrasts declaration section to the following format.

  ## Multiple contrasts are supported here.
  contrasts: !r list(
    c(
      factor = "phenotype",
      numerator = "Treatment1",
      denominator = "Control"
    ),
    c(
      factor = "phenotype",
      numerator = "Treatment2",
      denominator = "Control"
    ))
jxshi commented 3 years ago

Don't know if I should open a new issue. The Differential Expression step completed successfully, but the Functional Annotation step went into error in {r emapplot} step with the following error message.

Error in has_pairsim(x) : 
  Term similarity matrix not available. Please use pairwise_termsim function to deal with the results of enrichment analysis.
Backtrace:
    █
 1. ├─enrichplot::emapplot(enrich_go, showCategory = 25)
 2. └─enrichplot::emapplot(enrich_go, showCategory = 25)
 3.   └─enrichplot:::emapplot.enrichResult(...)
 4.     └─enrichplot:::has_pairsim(x)

Then I annotated the {r emapplot} to check if there are any other errors. The pipeline stopped at {r gse-go} step with the following error message:

preparing geneSet collections...
GSEA analysis...
There are ties in the preranked stats (0.01% of the list).
The order of those tied genes will be arbitrary, which may produce unexpected results.For some pathways, in reality P-values are less than 1e-10. You can set the `eps` argument to zero for better estimation.leading edge analysis...
done...
Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function 'species' for signature '"character"'
Backtrace:
    █
 1. └─clusterProfiler::gseGO(...)
 2.   └─AnnotationDbi::species(OrgDb)
 3.     └─(function (classes, fdef, mtable) ...

Can you check for me, please? Thank you in advance!

Best, Jianxiang

mjsteinbaugh commented 3 years ago

Thanks @jxshi yeah that appears to be a bug with the latest version of clusterProfiler. I'm looking to see if there's a fix for that plot issue.

mjsteinbaugh commented 3 years ago

For the OrgDb step, use get(org_db) to request the OrgDb package, rather than passing it in as a character string. This behavior was changed in the latest release of clusterProfiler, and breaks the current version of the template. I will fix that in the next package release.

jxshi commented 3 years ago

For the OrgDb step, use get(org_db) to request the OrgDb package, rather than passing it in as a character string. This behavior was changed in the latest release of clusterProfiler, and breaks the current version of the template. I will fix that in the next package release.

Thank you! I will try to install an earlier version of clusterProfiler package to see if I can solve this issue.

jxshi commented 3 years ago

Can you try repeating with these enabled in the R session:

options(
    "error" = quote(rlang::entrace()),
    "rlang_backtrace_on_error" = "full",
    "rstudio.errors.suppressed" = FALSE
)

This is really helpful for debugging. Thanks!

Hi @mjsteinbaugh ,

Here is the output of the error message:

🧪 ==> Feature metadata
    bcbio GTF file: /home/dell/local/share/bcbio/genomes/Hsapiens/hg38/rnaseq/ref-transcripts.gtf
→ Making `GRanges` from GFF file (ref-transcripts.gtf).
→ Getting GFF metadata for ref-transcripts.gtf.
→ Importing ref-transcripts.gtf at /home/dell/local/share/bcbio/genomes/Hsapiens/hg38/rnaseq using rtracklayer::`import()`.
Error in .makeGRangesFromRtracklayer(file = file, level = level, ignoreVersion = ignoreVersion,  :
  Assert failure.
[3] isString(meta[["provider"]]) is not TRUE.
Cause: 'x' does not have a length of 1.
Backtrace:
    █
 1. └─bcbioRNASeq::bcbioRNASeq(...)
 2.   └─AcidGenomes::makeGRangesFromGFF(...)
 3.     └─AcidGenomes:::.makeGRangesFromRtracklayer(...)
 4.       └─goalie::assert(...)

I have updated bcbioRNASeq and related packages from Acid Genomics to the latest version. The load data step popped up with the following error message:

→ Defining names by `geneId` column in `mcols`.
🧪 ==> Metadata
Error in eval(expr, envir = list(`?` = function(...) stop()), enclos = envir) :
  object '.val' not found
Calls: bcbioRNASeq ... glue_cmd -> glue -> glue_data -> <Anonymous> -> .transformer
Backtrace:
     █
  1. ├─bcbioRNASeq::bcbioRNASeq(...)
  2. │ └─AcidExperiment::makeSummarizedExperiment(...)
  3. │   └─AcidCLI::alertWarning(...)
  4. │     └─base::lapply(X = x, FUN = cli_alert_warning)
  5. │       └─cli:::FUN(X[[i]], ...)
  6. │         ├─cli:::cli__message(...)
  7. │         │ └─"id" %in% names(args)
  8. │         └─cli:::glue_cmd(text, .envir = .envir)
  9. │           └─glue::glue(...)
 10. │             └─glue::glue_data(...)
 11. └─(function (expr) ...
 12.   └─cli:::.transformer(expr, env)
Execution halted

sessionInfo() returned the following message:

R version 4.0.4 (2021-02-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.2 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] bcbioRNASeq_0.3.40 basejump_0.14.16  

loaded via a namespace (and not attached):
  [1] colorspace_2.0-0              ellipsis_0.3.1                XVector_0.30.0                GenomicRanges_1.42.0         
  [5] AcidPlots_0.3.5               rstudioapi_0.13               bit64_4.0.5                   interactiveDisplayBase_1.28.0
  [9] AnnotationDbi_1.52.0          fansi_0.4.2                   xml2_1.3.2                    splines_4.0.4                
 [13] tximport_1.18.0               cachem_1.0.4                  goalie_0.5.1                  geneplotter_1.68.0           
 [17] knitr_1.31                    AcidExperiment_0.1.10         jsonlite_1.7.2                Rsamtools_2.6.0              
 [21] annotate_1.68.0               dbplyr_2.1.0                  shiny_1.6.0                   readr_1.4.0                  
 [25] BiocManager_1.30.10           compiler_4.0.4                httr_1.4.2                    assertthat_0.2.1             
 [29] Matrix_1.3-2                  fastmap_1.1.0                 lazyeval_0.2.2                limma_3.46.0                 
 [33] cli_2.3.1                     later_1.1.0.1                 htmltools_0.5.1.1             prettyunits_1.1.1            
 [37] tools_4.0.4                   gtable_0.3.0                  glue_1.4.2                    GenomeInfoDbData_1.2.4       
 [41] dplyr_1.0.5                   rappdirs_0.3.3                tinytex_0.30                  Rcpp_1.0.6                   
 [45] Biobase_2.50.0                vctrs_0.3.6                   AcidGenomes_0.2.9             Biostrings_2.58.0            
 [49] AcidGenerics_0.5.17           rtracklayer_1.50.0            xfun_0.21                     stringr_1.4.0                
 [53] syntactic_0.4.4               mime_0.10                     lifecycle_1.0.0               ensembldb_2.14.0             
 [57] XML_3.99-0.5                  AcidCLI_0.1.0                 edgeR_3.32.1                  AnnotationHub_2.22.0         
 [61] zlibbioc_1.36.0               scales_1.1.1                  vroom_1.4.0                   hms_1.0.0                    
 [65] promises_1.2.0.1              MatrixGenerics_1.2.1          ProtGenerics_1.22.0           parallel_4.0.4               
 [69] SummarizedExperiment_1.20.0   AnnotationFilter_1.14.0       RColorBrewer_1.1-2            SingleCellExperiment_1.12.0  
 [73] yaml_2.2.1                    curl_4.3                      AcidSingleCell_0.1.7          memoise_2.0.0                
 [77] ggplot2_3.3.3                 biomaRt_2.46.3                stringi_1.5.3                 RSQLite_2.2.3                
 [81] genefilter_1.72.1             BiocVersion_3.12.0            S4Vectors_0.28.1              GenomicFeatures_1.42.1       
 [85] BiocGenerics_0.36.0           BiocParallel_1.24.1           GenomeInfoDb_1.26.2           rlang_0.4.10                 
 [89] pkgconfig_2.0.3               matrixStats_0.58.0            bitops_1.0-6                  evaluate_0.14                
 [93] lattice_0.20-41               purrr_0.3.4                   GenomicAlignments_1.26.0      cowplot_1.1.1                
 [97] bit_4.0.4                     tidyselect_1.1.0              magrittr_2.0.1                AcidPlyr_0.1.17              
[101] DESeq2_1.30.1                 R6_2.5.0                      IRanges_2.24.1                generics_0.1.0               
[105] DelayedArray_0.16.2           DBI_1.1.1                     pillar_1.5.1                  withr_2.4.1                  
[109] survival_3.2-7                RCurl_1.98-1.2                bcbioBase_0.6.19              tibble_3.1.0                 
[113] pipette_0.5.13                crayon_1.4.1                  utf8_1.1.4                    BiocFileCache_1.14.0         
[117] rmarkdown_2.7                 progress_1.2.2                locfit_1.5-9.4                grid_4.0.4                   
[121] data.table_1.14.0             blob_1.2.1                    digest_0.6.27                 xtable_1.8-4                 
[125] AcidBase_0.3.13               httpuv_1.5.5                  openssl_1.4.3                 stats4_4.0.4                 
[129] munsell_0.5.0                 AcidMarkdown_0.1.1            sessioninfo_1.1.1             askpass_1.1                  
mjsteinbaugh commented 3 years ago

@jxshi Thanks, I'm having trouble reproducing this on my end. How big is the bcbio final output directory? Would it be possible to share a copy? That will make debugging a lot easier.

mjsteinbaugh commented 3 years ago

Also, can you post the bcbioRNASeq() function call you're attempting to use?

jxshi commented 3 years ago

The bcbio final output directory is 98GB. Here is the bcbioRNASeq() function call:

library(bcbioRNASeq)
uploadDir = "/home/dell/data/XXX/final"

bcb <- bcbioRNASeq(
  uploadDir = uploadDir,
  interestingGroups = c("phenotype"),
  genomeBuild = "GRCm38",
  organism = "Mus musculus",
  ensemblRelease = 99L
)
mjsteinbaugh commented 3 years ago

Thanks, and I figured that was the case with the bcbio output directory. Would it be possible to post a copy of the dated directory inside "/home/dell/data/XXX/final" (e.g. YYYY-MM-DD_XXX containing project-summary.yaml)? That is often really useful for debugging.

jxshi commented 3 years ago

Sure, you can download the file from this link. tree 2021-01-20_aki/ returned the following result:

2021-01-20_aki/
├── annotated_combined.counts
├── bcbio-nextgen-commands.log
├── bcbio-nextgen.log
├── combined.counts
├── counts
│   └── tximport-counts.csv
├── data_versions.csv
├── featureCounts
│   ├── annotated_combined.counts
│   └── combined.counts
├── metadata.csv
├── multiqc
│   ├── list_files_final.txt
│   ├── multiqc_config.yaml
│   ├── multiqc_data
│   │   ├── multiqc_bcbio_metrics.txt
│   │   ├── multiqc_data_final.json
│   │   ├── multiqc_data.json
│   │   ├── multiqc_fastqc.txt
│   │   ├── multiqc_general_stats.txt
│   │   ├── multiqc.log
│   │   ├── multiqc_samtools_idxstats.txt
│   │   ├── multiqc_samtools_stats.txt
│   │   ├── multiqc_sources.txt
│   │   ├── multiqc_star.txt
│   │   ├── seqbuster_isomirs.txt
│   │   └── seqbuster_mirs.txt
│   ├── multiqc_report.html
│   └── report
│       └── metrics
│           ├── A1_bcbio.txt
│           ├── A2_bcbio.txt
│           ├── A3_bcbio.txt
│           ├── C1_bcbio.txt
│           ├── C2_bcbio.txt
│           ├── C3_bcbio.txt
│           ├── N1_bcbio.txt
│           ├── N2_bcbio.txt
│           ├── N3_bcbio.txt
│           ├── P1_bcbio.txt
│           ├── P2_bcbio.txt
│           └── P3_bcbio.txt
├── programs.txt
├── project-summary.yaml
├── tpm
│   └── tximport-tpm.csv
├── transcriptome
│   ├── mm10.fa
│   ├── mm10-tx2gene.csv
│   └── tx2gene.csv
└── tx2gene.csv
mjsteinbaugh commented 3 years ago

@jxshi I've been digging into this issue and I'm having trouble reproducing the error. Do you also see this error with another dataset? I'm trying to see if there's something in the sample and/or run metadata that is causing the error.

jxshi commented 3 years ago

@mjsteinbaugh Sorry for the late reply and thank you for your follow up. I have recently analyzed RNAseq data from Homo Sapiens, and it went smoothly. I have updated bcbioRNASeq and related R packages and its working now. I will close the issue now and will let you know if I encounter any new issue. Thank you again for your help! Best, Jianxiang

jxshi commented 3 years ago

Hi @mjsteinbaugh ,

I have updated the bcbioRNASeq package to the latest version recently. I analyzed a new set of new RNAseq data and fed it to bcbioRNASeq. This issue returned. Here is the R script used to load data.

rm(list = ls())
setwd("/public/home/zzumag03/data/shiyang/gbcancer/result")

library(bcbioRNASeq)
uploadDir = "/public/home/zzumag03/data/shiyang/gbcancer/final"

bcb <- bcbioRNASeq(
  uploadDir = uploadDir,
  interestingGroups = c("phenotype"),
  genomeBuild = "GRCh38",
  organism = "Homo sapiens",
  ensemblRelease = 99
)

print(Sys.Date())

saveData(bcb, dir = paste0("./rds/",Sys.Date()))

And here is the error message:

🧪 ## Feature metadata
→ Making `GRanges` from Ensembl.
→ Getting `EnsDb` from AnnotationHub 2.22.1 (2020-10-27).
ℹ "AH78783": Ensembl 99 EnsDb for Homo sapiens.
→ Making `GRanges` from `EnsDb`.
Organism: Homo sapiens
Genome build: GRCh38
Release: 99
Level: genes
→ Defining names by `geneId` column in `mcols`.
🧪 ## Metadata
Error in eval(expr, envir = list(`?` = function(...) stop()), enclos = envir) :
  object '.val' not found
In addition: Warning message:
call dbDisconnect() when finished working with a connection
Backtrace:
     █
  1. ├─bcbioRNASeq::bcbioRNASeq(...)
  2. │ └─AcidExperiment::makeSummarizedExperiment(...)
  3. │   └─AcidCLI::alertWarning(...)
  4. │     └─base::lapply(X = x, FUN = cli_alert_warning)
  5. │       └─cli:::FUN(X[[i]], ...)
  6. │         ├─cli:::cli__message(...)
  7. │         │ └─"id" %in% names(args)
  8. │         └─cli:::glue_cmd(text, .envir = .envir)
  9. │           └─glue::glue(...)
 10. │             └─glue::glue_data(...)
 11. └─(function (expr) ...
 12.   └─cli:::.transformer(expr, env)

Can you fix this, please? Thank you for your help.

Best, Jianxiang

mjsteinbaugh commented 3 years ago

Hi @jxshi , sure thing I'll look into this tomorrow.

jxshi commented 3 years ago

Hi @mjsteinbaugh ,

Is there any progression? Thanks!

Best, Jianxiang

mjsteinbaugh commented 3 years ago

Hi @jxshi, apologies for the delay. I'll try to block out some time to work on this tomorrow.

jxshi commented 3 years ago

Hi @mjsteinbaugh ,

I just deleted the following option ensemblRelease = 99 and it worked. So I will close this issue now.

Thank you and cheers!

Best, Jianxiang

mjsteinbaugh commented 3 years ago

OK thanks, I'll see if I can track down why that error may be occurring specifically with that Ensembl release. I think I may know why.

mjsteinbaugh commented 3 years ago

@jxshi OK this issue should be fixed in by the pending AcidExperiment v0.2.0 release. The (now fixed) problematic code is here for reference. makeSummarizedExperiment, which is called internally by the main bcbioRNASeq function, looks for mismatched gene annotations defined by either the ensemblRelease or gffFile argument. In the event that genes are defined in the count matrix but not the gene annotations (e.g. `rowRanges), the package will attempt to inform the user of the mismatch. There was a bug in the formatting of the CLI message that was causing the error here.

Adrian-Zet commented 2 years ago

Dear bcbioRNASeq team,

So I ran bcbio on a dataset with breast cancer and asymptomatic controls. And decided to use the template for differential expression. All went well except the multiple contrasts. As you can see I only have 2 conditions (asymptomatic control vs breast cancer). Is it possible to just declare this one contrast?

When I run the code I get the following error:

Error in `checkContrast()`:
! 'contrast' vector should be either a character vector of length 3,
a list of length 2 containing character vectors,
or a numeric vector, see the argument description in ?results

(I also got a different error when I tried something like:

contrasts: !r list(
      "breastCancer_vs_asymptomaticControls" = c(
        "factor" = "group",
        "numerator" = "Asymptomatic.Controls",
        "denominator" = "Breast.Cancer"
      ),
      "asymptomaticControls_vs_breastCancer" = c(
        "factor" = "group",
        "numerator" = "Breast.Cancer",
        "denominator" = "Asymptomatic.Controls"
      )
    )

Also, when I tried to see the resultsNames(dds) - it only lists "Intercept" column.

The interestingGroups of the object are ("group"). I list the whole object import here:

object <- bcbioRNASeq( uploadDir = file.path("final"), interestingGroups = c("group"), organism = "Homo sapiens" )

Also here is the .yaml resulted from running bcbio (changed it to .txt since it doesn't support uploading .yaml) project-summary.txt

Please provide a feedback on how I should build the contrasts for this template, or maybe if there is a way to completely bypass the contrasts for this case.

Thank you, Adrian