kvittingseerup / IsoformSwitchAnalyzeR

An R package to Identify, Annoatate and Visialize Isoform Switches with Functional Consequences (from RNA-seq data)
101 stars 18 forks source link

Problem during isoformSwitchAnalysisPart1 #120

Closed Pinolinoo closed 2 years ago

Pinolinoo commented 2 years ago

Hi,

I tried running the quick analysis and ran into this Error during isoformSwitchAnalysisPart1.

Step 1 of 3 : Detecting isoform switches... Step 2 of 3 : Adding and predicting open reading frames Error in getSeqlevelsReplacementMode(value, seqlevels(x)) : the supplied 'seqlevels' must be a character vector with no NAs and no duplicates

I used Stringtie to assemble transcripts and used the merge function to generate a new GTF File.

Here's my code:

library(IsoformSwitchAnalyzeR) library(BSgenome.Hsapiens.UCSC.hg38)

Stringtie_quant <- importIsoformExpression(parentDir = "/fast/AG_Metzger/philipp/results/ballgown/HD/", readLength = 76) pheno_data <- read.csv("/fast/AG_Metzger/philipp/results/ballgown/HD/phenodata_HD.csv") row.names(pheno_data) <- pheno_data$Filename names(pheno_data)[2] <- "sampleID" names(pheno_data)[4] <- "condition" pheno_data <- pheno_data[2:7] pheno_data <- pheno_data[-4] pheno_data <- pheno_data[-4] pheno_data <- pheno_data[-4] pheno_data <- pheno_data[-2]#have only condition and Patient.ID and sampleIDs in design Matrix

aSwitchList <- importRdata( isoformCountMatrix = Stringtie_quant$counts, isoformRepExpression = Stringtie_quant$abundance, designMatrix = pheno_data, isoformExonAnnoation = "/fast/AG_Metzger/philipp/results/stringtie/HD/stringtie_merged_HD.gtf", fixStringTieAnnotationProblem = TRUE, showProgress = TRUE )

genome <- BSgenome.Hsapiens.UCSC.hg38

exampleSwitchList <- isoformSwitchAnalysisPart1( pathToGTF = "/fast/AG_Metzger/philipp/results/stringtie/HD/stringtie_merged_HD.gtf", genomeObject = genome, switchAnalyzeRlist = aSwitchList, outputSequences = FALSE, prepareForWebServers = FALSE
)

I excuse any simple mistakes, since I'm new to Bio Informatics!

Thanks for your help.

kvittingseerup commented 2 years ago

That sounds very strange. Could you try running the individual functions that isoformSwitchAnalysisPart1 () wraps and send me the output? These are the individual functions

### Filter
mySwitchList <- preFilter( mySwitchList )

### Test for isoform switches
mySwitchList <- isoformSwitchTestDEXSeq( mySwitchList ) 

### Add ORF
mySwitchList <- addORFfromGTF( mySwitchList )
mySwitchList <- analyzeNovelIsoformORF( mySwitchList )
Pinolinoo commented 2 years ago

HI Kristoffer, Thanks for your quick reply!

Here's the output:

aSwitchList <- preFilter(aSwitchList) The filtering removed 42174 ( 38.25% of ) transcripts. There is now 68090 isoforms left

> aSwitchList <- isoformSwitchTestDEXSeq(aSwitchList) Step 1 of 2: Testing each pairwise comparisons with DEXSeq (this might be a bit slow)... Estimated run time is: 56.3 min Step 2 of 2: Integrating result into switchAnalyzeRlist... Isoform switch analysis was performed for 9440 gene comparisons (100%). Total runtime: 54.57 min Done

then in the 3rd the error comes again:

> aSwitchList <- addORFfromGTF(aSwitchList, pathToGTF = "/fast/AG_Metzger/philipp/results/stringtie/HD/stringtie_merged_HD.gtf") Step 1 of 2: importing GTF (this may take a while)... Error in getSeqlevelsReplacementMode(value, seqlevels(x)) : the supplied 'seqlevels' must be a character vector with no NAs and no duplicates

kvittingseerup commented 2 years ago

That is really strange. But I'm afraid I cannot debug it without the data.

Would you be willing to send me the data so I can try it? I would naturally keep it confidential and delete it once I was done with the debugging.

If so you can save the aSwitchList R object with the save() function and email that, along with the (compressed) GTF file, to my email (which you can find here.

Pinolinoo commented 2 years ago

Thanks so much and sorry for the late response. I had to check wheter if its ok to send you the data. It is and I am currently uploading. Thank you in advance already and please keep the data confidential :)

kvittingseerup commented 2 years ago

Could you post the output of running sessionInfo()

Pinolinoo commented 2 years ago

`R version 4.1.2 (2021-11-01) Platform: x86_64-unknown-linux-gnu (64-bit) Running under: CentOS Linux 7 (Core)

Matrix products: default BLAS/LAPACK: /gnu/store/bs9pl1f805ins80xaf4s3n35a0x2lyq3-openblas-0.3.9/lib/libopenblasp-r0.3.9.so

locale: [1] C

attached base packages: [1] stats4 stats graphics grDevices utils datasets methods base

other attached packages: [1] tximport_1.22.0 BSgenome.Hsapiens.UCSC.hg38_1.4.1 BSgenome_1.62.0 rtracklayer_1.54.0
[5] Biostrings_2.62.0 XVector_0.34.0 dplyr_1.0.7 magrittr_2.0.1
[9] IsoformSwitchAnalyzeR_1.16.0 ggplot2_3.3.5 DEXSeq_1.40.0 RColorBrewer_1.1-2
[13] AnnotationDbi_1.56.2 DESeq2_1.34.0 SummarizedExperiment_1.24.0 GenomicRanges_1.46.1
[17] GenomeInfoDb_1.30.0 IRanges_2.28.0 S4Vectors_0.32.3 MatrixGenerics_1.6.0
[21] matrixStats_0.61.0 Biobase_2.54.0 BiocGenerics_0.40.0 BiocParallel_1.28.1
[25] limma_3.50.0

loaded via a namespace (and not attached): [1] colorspace_2.0-2 rjson_0.2.20 hwriter_1.3.2 ellipsis_0.3.2 futile.logger_1.4.3
[6] bit64_4.0.5 interactiveDisplayBase_1.32.0 fansi_0.5.0 xml2_1.3.2 splines_4.1.2
[11] cachem_1.0.6 geneplotter_1.72.0 jsonlite_1.7.2 Rsamtools_2.10.0 annotate_1.72.0
[16] dbplyr_2.1.1 png_0.1-7 shiny_1.7.1 readr_2.1.0 BiocManager_1.30.16
[21] compiler_4.1.2 httr_1.4.2 lazyeval_0.2.2 assertthat_0.2.1 Matrix_1.3-4
[26] fastmap_1.1.0 cli_3.1.0 later_1.3.0 formatR_1.11 htmltools_0.5.2
[31] prettyunits_1.1.1 tools_4.1.2 gtable_0.3.0 glue_1.5.0 GenomeInfoDbData_1.2.0
[36] reshape2_1.4.4 rappdirs_0.3.3 Rcpp_1.0.7 DRIMSeq_1.22.0 vctrs_0.3.8
[41] stringr_1.4.0 mime_0.12 lifecycle_1.0.1 ensembldb_2.18.2 restfulr_0.0.13
[46] statmod_1.4.36 XML_3.99-0.8 edgeR_3.36.0 AnnotationHub_3.2.0 scales_1.1.1
[51] vroom_1.5.6 ProtGenerics_1.26.0 hms_1.1.1 promises_1.2.0.1 parallel_4.1.2
[56] AnnotationFilter_1.18.0 lambda.r_1.2.4 yaml_2.2.1 curl_4.3.2 memoise_2.0.0
[61] gridExtra_2.3 biomaRt_2.50.1 stringi_1.7.5 RSQLite_2.2.8 BiocVersion_3.14.0
[66] genefilter_1.76.0 BiocIO_1.4.0 GenomicFeatures_1.46.1 filelock_1.0.2 rlang_0.4.12
[71] pkgconfig_2.0.3 bitops_1.0-7 lattice_0.20-45 purrr_0.3.4 GenomicAlignments_1.30.0
[76] bit_4.0.4 tidyselect_1.1.1 plyr_1.8.6 R6_2.5.1 generics_0.1.1
[81] DelayedArray_0.20.0 DBI_1.1.1 pillar_1.6.4 withr_2.4.2 survival_3.2-13
[86] KEGGREST_1.34.0 RCurl_1.95-0.1.2 tibble_3.1.6 tximeta_1.12.3 crayon_1.4.2
[91] futile.options_1.0.1 utf8_1.2.2 BiocFileCache_2.2.0 tzdb_0.2.0 progress_1.2.2
[96] locfit_1.5-9.4 grid_4.1.2 blob_1.2.2 digest_0.6.28 xtable_1.8-4
[101] VennDiagram_1.7.0 httpuv_1.6.3 munsell_0.5.0 `

kvittingseerup commented 2 years ago

Thanks! I think I found the problem.

You need the latest version from the development branch. I've just pushed 1.17.03 to BioC but it will probably be a day or two before it goes through. In the meantime you can download the updates from GitHub via this R command:

if (!requireNamespace("devtools", quietly = TRUE)){
    install.packages("devtools")
}
devtools::install_github("kvittingseerup/IsoformSwitchAnalyzeR", build_vignettes = TRUE)

After installation restart R and verify update with this function packageVersion('IsoformSwitchAnalyzeR') and try again.

Let me know if that works :-).

Cheers Kristoffer

Pinolinoo commented 2 years ago

I'm working on the cluster from our institute, so I will have to wait until the people responsible can update the package on there. However when trying to download it on my local RStudio I got:

Fehler: Failed to install 'IsoformSwitchAnalyzeR' from GitHub: System command 'R' failed, exit status: 1, stdout + stderr (last 10 lines): E> with piece 1: E> isoformUpregulated isoformDownregulated comparison E> 1 TCONS_00000234 TCONS_00000235 1 E> --- failed re-building ‘IsoformSwitchAnalyzeR.Rmd’ E> E> SUMMARY: processing the following file failed: E> ‘IsoformSwitchAnalyzeR.Rmd’ E> E> Fehler: Vignette re-building failed.

kvittingseerup commented 2 years ago

Oh - I was to fast. Try updating to 1.17.04 (by using the same command).

Pinolinoo commented 2 years ago

ok the installation worked now but I got this error when running part1: Step 1 of 3 : Detecting isoform switches... Step 2 of 3 : Adding and predicting open reading frames Fehler in addORFfromGTF(switchAnalyzeRlist = switchAnalyzeRlist, pathToGTF = pathToGTF, : No ORFs could be added to the switchAnalyzeRlist. Please ensure GTF file have CDS info (and that isoform ids match).

kvittingseerup commented 2 years ago

Before answering can I ask what you get out of that error message? (Only asking because I'm continuously trying to improve them).

Pinolinoo commented 2 years ago

I am not quite sure but I suppose there might be something wrong with my GTF file? so maybe a setting I should have set while generating it with stringtie merge?

kvittingseerup commented 2 years ago

Thanks for the feedback. The problem is that you should use the official GTF that you downloaded from an official source such as Ensembl, Gencode, etc. The one you also used to guide stringtie.

Pinolinoo commented 2 years ago

thank you! now I am getting an error during step 3. I used the same aSwitchlist object which I also sent to you for debugging!

Step 1 of 3 : Detecting isoform switches... Step 2 of 3 : Adding and predicting open reading frames Step 3 of 3 : Extracting (and outputting) sequences Fehler in extractSequence(switchAnalyzeRlist = switchAnalyzeRlist, genomeObject = genomeObject, : The object supplied to 'switchAnalyzeRlist' must be a 'switchAnalyzeRlist' Zusätzlich: Warnmeldungen: 1: In DESeqDataSet(rse, design, ignoreRank = TRUE) : some variables in design formula are characters, converting to factors 2: In addORFfromGTF(switchAnalyzeRlist = switchAnalyzeRlist, pathToGTF = pathToGTF, : It seems ORF was only added to an unlikely small fraction of isoforms (less than 10%).If you are not sure this is on purpose something went wrong and the files are not matching.

kvittingseerup commented 2 years ago

Could you post this as a new issue to keep the errors separate?

Also when you do please inspect the "mySwitchList" you get from running the individual steps of the isoformSwitchAnalysisPart1():

### Filter
mySwitchList <- preFilter( mySwitchList )

### Test for isoform switches
mySwitchList <- isoformSwitchTestDEXSeq( mySwitchList ) 

### Add ORF
mySwitchList <- addORFfromGTF( mySwitchList )
mySwitchList <- analyzeNovelIsoformORF( mySwitchList )

### Add and write bio sequences
mySwitchList <- extractSequence()