Closed jxshi closed 3 years ago
@jxshi thanks I’ll see if I can reproduce and will work on a fix
Can you try repeating with these enabled in the R session:
options(
"error" = quote(rlang::entrace()),
"rlang_backtrace_on_error" = "full",
"rstudio.errors.suppressed" = FALSE
)
This is really helpful for debugging. Thanks!
Can you try repeating with these enabled in the R session:
options( "error" = quote(rlang::entrace()), "rlang_backtrace_on_error" = "full", "rstudio.errors.suppressed" = FALSE )
This is really helpful for debugging. Thanks!
Hi @mjsteinbaugh ,
Here is the output of the error message:
🧪 ==> Feature metadata
bcbio GTF file: /home/dell/local/share/bcbio/genomes/Hsapiens/hg38/rnaseq/ref-transcripts.gtf
→ Making `GRanges` from GFF file (ref-transcripts.gtf).
→ Getting GFF metadata for ref-transcripts.gtf.
→ Importing ref-transcripts.gtf at /home/dell/local/share/bcbio/genomes/Hsapiens/hg38/rnaseq using rtracklayer::`import()`.
Error in .makeGRangesFromRtracklayer(file = file, level = level, ignoreVersion = ignoreVersion, :
Assert failure.
[3] isString(meta[["provider"]]) is not TRUE.
Cause: 'x' does not have a length of 1.
Backtrace:
█
1. └─bcbioRNASeq::bcbioRNASeq(...)
2. └─AcidGenomes::makeGRangesFromGFF(...)
3. └─AcidGenomes:::.makeGRangesFromRtracklayer(...)
4. └─goalie::assert(...)
Perfect I’ll fix that today. The issue is that I need to tweak the handling of the bcbio GTF file.
Great! Thanks for your timely reply. I will try later to check if it can work out.
Best, Jianxiang
If you update the AcidGenomes dependency package to v0.2.7, does that fix your issue? https://github.com/acidgenomics/r-acidgenomes
If you update the AcidGenomes dependency package to v0.2.7, does that fix your issue? https://github.com/acidgenomics/r-acidgenomes
I have completed the update and the load data
step went through. Then it got stuck in the differential expression analysis
step in line 305-323.
Quitting from lines 305-323 (treatment1_vs_control_DiffExpr.rmd)
Error in .local(object, ...) :
unused argument (DESeqDataSet = new("DESeqDataSet", design = ~phenotype, dispersionFunction = function (q)
coefs[1] + coefs[2]/q, rowRanges = new("GRanges", seqnames = new("Rle", values = c(23, 20, 1, 6, 1, 23, 6, 3, 7, 12, 7, 11, 4, 23, 4, 7, 17, 7, 12, 23, 2, 7, 16, 2, 3, 8, 7, 17, 3, 1, 4, 12, 1, 3, 17, 12, 7, 19, 16, 7, 6, 3, 7, 23, 7, 17, 7, 23, 17, 16, 19, 7, 23, 4, 7, 17, 7, 17, 12, 16, 19, 9, 17, 7, 23, 16, 17, 7, 2, 7, 16, 6, 11, 13, 16, 17, 23, 17, 7, 16, 19, 17, 11, 16, 11, 17, 7, 16, 17, 16, 7, 14, 7, 11, 1, 7, 2, 11, 7, 19, 7, 19, 17, 7, 19, 7, 17, 7, 23, 17, 12, 5, 2, 16, 19, 4, 19, 17,
19, 17, 1, 23, 11, 16, 3, 16, 6, 1, 6, 23, 1, 6, 23, 6, 23, 1, 6, 3, 7, 17, 3, 7, 12, 3, 19, 12, 19, 8, 16, 22, 17, 8, 2, 3, 8, 1, 7, 6, 23, 1, 6, 1, 14, 6, 7, 6, 1, 12, 17, 3, 7, 12, 3, 12, 19, 3, 19, 12, 23, 9, 16, 12, 23, 6, 1, 6, 1, 17, 5, 12, 14, 19, 17, 3, 23, 19, 17, 7, 17, 19, 11, 3, 11, 19, 7, 19, 9, 12, 19, 2, 19, 16, 23, 17, 19, 3, 23, 3, 8, 12, 6, 10, 24, 12, 14, 2,
Calls: render ... eval -> eval -> mapply -> <Anonymous> -> <Anonymous>
And I have also tried to enable the following options in the R session, the same error message popped up.
options(
"error" = quote(rlang::entrace()),
"rlang_backtrace_on_error" = "full",
"rstudio.errors.suppressed" = FALSE
)
OK great, I’ll see if I can fix that one today.
@jxshi Does this draft version of a revised differential expression template work for you? https://github.com/acidgenomics/r-bcbiornaseq/blob/master/inst/rmarkdown/templates/02-differential-expression/skeleton/skeleton.Rmd
Hi @mjsteinbaugh ,
Thank you for your solution. I have successfully run the differential expression
step with one contrast group. However, when put multiple contrasts in the ## Multiple contrasts are supported here
section.
## Multiple contrasts are supported here.
contrasts: !r list(
c(
factor = "phenotype",
numerator = "Treatment1",
denominator = "Control"
),
c(
factor = "phenotype",
numerator = "Treatment2",
denominator = "Control"
)
)
The following error popped up.
Error in yaml::yaml.load(..., eval.expr = TRUE) :
Scanner error: while scanning a simple key at line 28, column 3 could not find expected ':' at line 29, column 3
Backtrace:
█
1. └─(function (input, output_yaml = NULL) ...
2. └─rmarkdown:::output_format_from_yaml_front_matter(...)
3. └─rmarkdown:::parse_yaml_front_matter(input_lines)
4. └─rmarkdown:::yaml_load(front_matter)
5. └─yaml::yaml.load(..., eval.expr = TRUE)
OK thanks I’ll fix that
Hi @mjsteinbaugh ,
Thank you for your solution. I have successfully run the
differential expression
step with one contrast group. However, when put multiple contrasts in the## Multiple contrasts are supported here
section.## Multiple contrasts are supported here. contrasts: !r list( c( factor = "phenotype", numerator = "Treatment1", denominator = "Control" ), c( factor = "phenotype", numerator = "Treatment2", denominator = "Control" ) )
The following error popped up.
Error in yaml::yaml.load(..., eval.expr = TRUE) : Scanner error: while scanning a simple key at line 28, column 3 could not find expected ':' at line 29, column 3 Backtrace: █ 1. └─(function (input, output_yaml = NULL) ... 2. └─rmarkdown:::output_format_from_yaml_front_matter(...) 3. └─rmarkdown:::parse_yaml_front_matter(input_lines) 4. └─rmarkdown:::yaml_load(front_matter) 5. └─yaml::yaml.load(..., eval.expr = TRUE)
I solved the problem by editing the Multiple contrasts
declaration section to the following format.
## Multiple contrasts are supported here.
contrasts: !r list(
c(
factor = "phenotype",
numerator = "Treatment1",
denominator = "Control"
),
c(
factor = "phenotype",
numerator = "Treatment2",
denominator = "Control"
))
Don't know if I should open a new issue. The Differential Expression
step completed successfully, but the Functional Annotation
step went into error in {r emapplot}
step with the following error message.
Error in has_pairsim(x) :
Term similarity matrix not available. Please use pairwise_termsim function to deal with the results of enrichment analysis.
Backtrace:
█
1. ├─enrichplot::emapplot(enrich_go, showCategory = 25)
2. └─enrichplot::emapplot(enrich_go, showCategory = 25)
3. └─enrichplot:::emapplot.enrichResult(...)
4. └─enrichplot:::has_pairsim(x)
Then I annotated the {r emapplot}
to check if there are any other errors. The pipeline stopped at {r gse-go}
step with the following error message:
preparing geneSet collections...
GSEA analysis...
There are ties in the preranked stats (0.01% of the list).
The order of those tied genes will be arbitrary, which may produce unexpected results.For some pathways, in reality P-values are less than 1e-10. You can set the `eps` argument to zero for better estimation.leading edge analysis...
done...
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function 'species' for signature '"character"'
Backtrace:
█
1. └─clusterProfiler::gseGO(...)
2. └─AnnotationDbi::species(OrgDb)
3. └─(function (classes, fdef, mtable) ...
Can you check for me, please? Thank you in advance!
Best, Jianxiang
Thanks @jxshi yeah that appears to be a bug with the latest version of clusterProfiler. I'm looking to see if there's a fix for that plot issue.
For the OrgDb step, use get(org_db)
to request the OrgDb package, rather than passing it in as a character string. This behavior was changed in the latest release of clusterProfiler, and breaks the current version of the template. I will fix that in the next package release.
For the OrgDb step, use
get(org_db)
to request the OrgDb package, rather than passing it in as a character string. This behavior was changed in the latest release of clusterProfiler, and breaks the current version of the template. I will fix that in the next package release.
Thank you! I will try to install an earlier version of clusterProfiler
package to see if I can solve this issue.
Can you try repeating with these enabled in the R session:
options( "error" = quote(rlang::entrace()), "rlang_backtrace_on_error" = "full", "rstudio.errors.suppressed" = FALSE )
This is really helpful for debugging. Thanks!
Hi @mjsteinbaugh ,
Here is the output of the error message:
🧪 ==> Feature metadata bcbio GTF file: /home/dell/local/share/bcbio/genomes/Hsapiens/hg38/rnaseq/ref-transcripts.gtf → Making `GRanges` from GFF file (ref-transcripts.gtf). → Getting GFF metadata for ref-transcripts.gtf. → Importing ref-transcripts.gtf at /home/dell/local/share/bcbio/genomes/Hsapiens/hg38/rnaseq using rtracklayer::`import()`. Error in .makeGRangesFromRtracklayer(file = file, level = level, ignoreVersion = ignoreVersion, : Assert failure. [3] isString(meta[["provider"]]) is not TRUE. Cause: 'x' does not have a length of 1. Backtrace: █ 1. └─bcbioRNASeq::bcbioRNASeq(...) 2. └─AcidGenomes::makeGRangesFromGFF(...) 3. └─AcidGenomes:::.makeGRangesFromRtracklayer(...) 4. └─goalie::assert(...)
I have updated bcbioRNASeq
and related packages from Acid Genomics to the latest version. The load data step popped up with the following error message:
→ Defining names by `geneId` column in `mcols`.
🧪 ==> Metadata
Error in eval(expr, envir = list(`?` = function(...) stop()), enclos = envir) :
object '.val' not found
Calls: bcbioRNASeq ... glue_cmd -> glue -> glue_data -> <Anonymous> -> .transformer
Backtrace:
█
1. ├─bcbioRNASeq::bcbioRNASeq(...)
2. │ └─AcidExperiment::makeSummarizedExperiment(...)
3. │ └─AcidCLI::alertWarning(...)
4. │ └─base::lapply(X = x, FUN = cli_alert_warning)
5. │ └─cli:::FUN(X[[i]], ...)
6. │ ├─cli:::cli__message(...)
7. │ │ └─"id" %in% names(args)
8. │ └─cli:::glue_cmd(text, .envir = .envir)
9. │ └─glue::glue(...)
10. │ └─glue::glue_data(...)
11. └─(function (expr) ...
12. └─cli:::.transformer(expr, env)
Execution halted
sessionInfo()
returned the following message:
R version 4.0.4 (2021-02-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.2 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] bcbioRNASeq_0.3.40 basejump_0.14.16
loaded via a namespace (and not attached):
[1] colorspace_2.0-0 ellipsis_0.3.1 XVector_0.30.0 GenomicRanges_1.42.0
[5] AcidPlots_0.3.5 rstudioapi_0.13 bit64_4.0.5 interactiveDisplayBase_1.28.0
[9] AnnotationDbi_1.52.0 fansi_0.4.2 xml2_1.3.2 splines_4.0.4
[13] tximport_1.18.0 cachem_1.0.4 goalie_0.5.1 geneplotter_1.68.0
[17] knitr_1.31 AcidExperiment_0.1.10 jsonlite_1.7.2 Rsamtools_2.6.0
[21] annotate_1.68.0 dbplyr_2.1.0 shiny_1.6.0 readr_1.4.0
[25] BiocManager_1.30.10 compiler_4.0.4 httr_1.4.2 assertthat_0.2.1
[29] Matrix_1.3-2 fastmap_1.1.0 lazyeval_0.2.2 limma_3.46.0
[33] cli_2.3.1 later_1.1.0.1 htmltools_0.5.1.1 prettyunits_1.1.1
[37] tools_4.0.4 gtable_0.3.0 glue_1.4.2 GenomeInfoDbData_1.2.4
[41] dplyr_1.0.5 rappdirs_0.3.3 tinytex_0.30 Rcpp_1.0.6
[45] Biobase_2.50.0 vctrs_0.3.6 AcidGenomes_0.2.9 Biostrings_2.58.0
[49] AcidGenerics_0.5.17 rtracklayer_1.50.0 xfun_0.21 stringr_1.4.0
[53] syntactic_0.4.4 mime_0.10 lifecycle_1.0.0 ensembldb_2.14.0
[57] XML_3.99-0.5 AcidCLI_0.1.0 edgeR_3.32.1 AnnotationHub_2.22.0
[61] zlibbioc_1.36.0 scales_1.1.1 vroom_1.4.0 hms_1.0.0
[65] promises_1.2.0.1 MatrixGenerics_1.2.1 ProtGenerics_1.22.0 parallel_4.0.4
[69] SummarizedExperiment_1.20.0 AnnotationFilter_1.14.0 RColorBrewer_1.1-2 SingleCellExperiment_1.12.0
[73] yaml_2.2.1 curl_4.3 AcidSingleCell_0.1.7 memoise_2.0.0
[77] ggplot2_3.3.3 biomaRt_2.46.3 stringi_1.5.3 RSQLite_2.2.3
[81] genefilter_1.72.1 BiocVersion_3.12.0 S4Vectors_0.28.1 GenomicFeatures_1.42.1
[85] BiocGenerics_0.36.0 BiocParallel_1.24.1 GenomeInfoDb_1.26.2 rlang_0.4.10
[89] pkgconfig_2.0.3 matrixStats_0.58.0 bitops_1.0-6 evaluate_0.14
[93] lattice_0.20-41 purrr_0.3.4 GenomicAlignments_1.26.0 cowplot_1.1.1
[97] bit_4.0.4 tidyselect_1.1.0 magrittr_2.0.1 AcidPlyr_0.1.17
[101] DESeq2_1.30.1 R6_2.5.0 IRanges_2.24.1 generics_0.1.0
[105] DelayedArray_0.16.2 DBI_1.1.1 pillar_1.5.1 withr_2.4.1
[109] survival_3.2-7 RCurl_1.98-1.2 bcbioBase_0.6.19 tibble_3.1.0
[113] pipette_0.5.13 crayon_1.4.1 utf8_1.1.4 BiocFileCache_1.14.0
[117] rmarkdown_2.7 progress_1.2.2 locfit_1.5-9.4 grid_4.0.4
[121] data.table_1.14.0 blob_1.2.1 digest_0.6.27 xtable_1.8-4
[125] AcidBase_0.3.13 httpuv_1.5.5 openssl_1.4.3 stats4_4.0.4
[129] munsell_0.5.0 AcidMarkdown_0.1.1 sessioninfo_1.1.1 askpass_1.1
@jxshi Thanks, I'm having trouble reproducing this on my end. How big is the bcbio final output directory? Would it be possible to share a copy? That will make debugging a lot easier.
Also, can you post the bcbioRNASeq()
function call you're attempting to use?
The bcbio final
output directory is 98GB.
Here is the bcbioRNASeq()
function call:
library(bcbioRNASeq)
uploadDir = "/home/dell/data/XXX/final"
bcb <- bcbioRNASeq(
uploadDir = uploadDir,
interestingGroups = c("phenotype"),
genomeBuild = "GRCm38",
organism = "Mus musculus",
ensemblRelease = 99L
)
Thanks, and I figured that was the case with the bcbio output directory. Would it be possible to post a copy of the dated directory inside "/home/dell/data/XXX/final"
(e.g. YYYY-MM-DD_XXX
containing project-summary.yaml
)? That is often really useful for debugging.
Sure, you can download the file from this link.
tree 2021-01-20_aki/
returned the following result:
2021-01-20_aki/
├── annotated_combined.counts
├── bcbio-nextgen-commands.log
├── bcbio-nextgen.log
├── combined.counts
├── counts
│ └── tximport-counts.csv
├── data_versions.csv
├── featureCounts
│ ├── annotated_combined.counts
│ └── combined.counts
├── metadata.csv
├── multiqc
│ ├── list_files_final.txt
│ ├── multiqc_config.yaml
│ ├── multiqc_data
│ │ ├── multiqc_bcbio_metrics.txt
│ │ ├── multiqc_data_final.json
│ │ ├── multiqc_data.json
│ │ ├── multiqc_fastqc.txt
│ │ ├── multiqc_general_stats.txt
│ │ ├── multiqc.log
│ │ ├── multiqc_samtools_idxstats.txt
│ │ ├── multiqc_samtools_stats.txt
│ │ ├── multiqc_sources.txt
│ │ ├── multiqc_star.txt
│ │ ├── seqbuster_isomirs.txt
│ │ └── seqbuster_mirs.txt
│ ├── multiqc_report.html
│ └── report
│ └── metrics
│ ├── A1_bcbio.txt
│ ├── A2_bcbio.txt
│ ├── A3_bcbio.txt
│ ├── C1_bcbio.txt
│ ├── C2_bcbio.txt
│ ├── C3_bcbio.txt
│ ├── N1_bcbio.txt
│ ├── N2_bcbio.txt
│ ├── N3_bcbio.txt
│ ├── P1_bcbio.txt
│ ├── P2_bcbio.txt
│ └── P3_bcbio.txt
├── programs.txt
├── project-summary.yaml
├── tpm
│ └── tximport-tpm.csv
├── transcriptome
│ ├── mm10.fa
│ ├── mm10-tx2gene.csv
│ └── tx2gene.csv
└── tx2gene.csv
@jxshi I've been digging into this issue and I'm having trouble reproducing the error. Do you also see this error with another dataset? I'm trying to see if there's something in the sample and/or run metadata that is causing the error.
@mjsteinbaugh Sorry for the late reply and thank you for your follow up. I have recently analyzed RNAseq data from Homo Sapiens, and it went smoothly. I have updated bcbioRNASeq
and related R packages and its working now.
I will close the issue now and will let you know if I encounter any new issue.
Thank you again for your help!
Best,
Jianxiang
Hi @mjsteinbaugh ,
I have updated the bcbioRNASeq
package to the latest version recently. I analyzed a new set of new RNAseq data and fed it to bcbioRNASeq
. This issue returned. Here is the R script used to load data.
rm(list = ls())
setwd("/public/home/zzumag03/data/shiyang/gbcancer/result")
library(bcbioRNASeq)
uploadDir = "/public/home/zzumag03/data/shiyang/gbcancer/final"
bcb <- bcbioRNASeq(
uploadDir = uploadDir,
interestingGroups = c("phenotype"),
genomeBuild = "GRCh38",
organism = "Homo sapiens",
ensemblRelease = 99
)
print(Sys.Date())
saveData(bcb, dir = paste0("./rds/",Sys.Date()))
And here is the error message:
🧪 ## Feature metadata
→ Making `GRanges` from Ensembl.
→ Getting `EnsDb` from AnnotationHub 2.22.1 (2020-10-27).
ℹ "AH78783": Ensembl 99 EnsDb for Homo sapiens.
→ Making `GRanges` from `EnsDb`.
Organism: Homo sapiens
Genome build: GRCh38
Release: 99
Level: genes
→ Defining names by `geneId` column in `mcols`.
🧪 ## Metadata
Error in eval(expr, envir = list(`?` = function(...) stop()), enclos = envir) :
object '.val' not found
In addition: Warning message:
call dbDisconnect() when finished working with a connection
Backtrace:
█
1. ├─bcbioRNASeq::bcbioRNASeq(...)
2. │ └─AcidExperiment::makeSummarizedExperiment(...)
3. │ └─AcidCLI::alertWarning(...)
4. │ └─base::lapply(X = x, FUN = cli_alert_warning)
5. │ └─cli:::FUN(X[[i]], ...)
6. │ ├─cli:::cli__message(...)
7. │ │ └─"id" %in% names(args)
8. │ └─cli:::glue_cmd(text, .envir = .envir)
9. │ └─glue::glue(...)
10. │ └─glue::glue_data(...)
11. └─(function (expr) ...
12. └─cli:::.transformer(expr, env)
Can you fix this, please? Thank you for your help.
Best, Jianxiang
Hi @jxshi , sure thing I'll look into this tomorrow.
Hi @mjsteinbaugh ,
Is there any progression? Thanks!
Best, Jianxiang
Hi @jxshi, apologies for the delay. I'll try to block out some time to work on this tomorrow.
Hi @mjsteinbaugh ,
I just deleted the following option ensemblRelease = 99
and it worked. So I will close this issue now.
Thank you and cheers!
Best, Jianxiang
OK thanks, I'll see if I can track down why that error may be occurring specifically with that Ensembl release. I think I may know why.
@jxshi OK this issue should be fixed in by the pending AcidExperiment v0.2.0 release. The (now fixed) problematic code is here for reference. makeSummarizedExperiment
, which is called internally by the main bcbioRNASeq
function, looks for mismatched gene annotations defined by either the ensemblRelease
or gffFile
argument. In the event that genes are defined in the count matrix but not the gene annotations (e.g. `rowRanges), the package will attempt to inform the user of the mismatch. There was a bug in the formatting of the CLI message that was causing the error here.
Dear bcbioRNASeq team,
So I ran bcbio on a dataset with breast cancer and asymptomatic controls. And decided to use the template for differential expression. All went well except the multiple contrasts. As you can see I only have 2 conditions (asymptomatic control vs breast cancer). Is it possible to just declare this one contrast?
When I run the code I get the following error:
Error in `checkContrast()`:
! 'contrast' vector should be either a character vector of length 3,
a list of length 2 containing character vectors,
or a numeric vector, see the argument description in ?results
(I also got a different error when I tried something like:
contrasts: !r list(
"breastCancer_vs_asymptomaticControls" = c(
"factor" = "group",
"numerator" = "Asymptomatic.Controls",
"denominator" = "Breast.Cancer"
),
"asymptomaticControls_vs_breastCancer" = c(
"factor" = "group",
"numerator" = "Breast.Cancer",
"denominator" = "Asymptomatic.Controls"
)
)
Also, when I tried to see the resultsNames(dds) - it only lists "Intercept" column.
The interestingGroups of the object are ("group"). I list the whole object import here:
object <- bcbioRNASeq( uploadDir = file.path("final"), interestingGroups = c("group"), organism = "Homo sapiens" )
Also here is the .yaml resulted from running bcbio (changed it to .txt since it doesn't support uploading .yaml) project-summary.txt
Please provide a feedback on how I should build the contrasts for this template, or maybe if there is a way to completely bypass the contrasts for this case.
Thank you, Adrian
Hi,
I have installed the latest
bcbioRNASeq
in another workstation to check if it can work. However, an error message popped up in theload data from the bcbio-nextgen final directory
step. Here is the error message:R sessionInfo() returned the following message:
Can you check for me please? Thank you in advance!
Best, Jianxiang