jpromeror / EventPointer

R package for the identification and statistical analysis of alternative splicing events using junction arrays or RNASeq data
4 stars 0 forks source link

EventDetection running error "arguments imply differing number of rows: 2, 3" #18

Open huang-sh opened 1 year ago

huang-sh commented 1 year ago

Hi, Firstly, thanks you for developing the R package. I'm experiencing an issue with EventDetection calling. I'm hoping that someone can provide some guidance on how to fix this error or suggest some next steps. Thank you in advance for your help and expertise!"

I download BAM file from ENCODE

forebrain.rnaseq.e11.5.Rep2.bam ENCFF962IWY https://encode-public.s3.amazonaws.com/2021/03/15/58cc5140-bfbc-4f4a-bf8b-e484405c6651/ENCFF962IWY.bam
forebrain.rnaseq.e11.5.Rep1.bam ENCFF362XNO https://encode-public.s3.amazonaws.com/2021/03/15/c12c5935-f1ee-4f67-bb73-6159d25f5b5c/ENCFF362XNO.bam
forebrain.rnaseq.e16.5.Rep2.bam ENCFF897FSK https://encode-public.s3.amazonaws.com/2021/03/15/428961e2-ca05-4057-8078-dbd106926965/ENCFF897FSK.bam
forebrain.rnaseq.e16.5.Rep1.bam ENCFF982DHB https://encode-public.s3.amazonaws.com/2021/03/15/7b5173f9-d5b2-4bd7-8595-0d543d0c2c63/ENCFF982DHB.bam

And the GTF is from https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M25/


> Samples <- c("fb.e11.5.Rep1.bam", "fb.e11.5.Rep2.bam", "fb.e16.5.Rep1.bam", "fb.e16.5.Rep2.bam")
> PathToSamples <- "forebrain.rnaseq"
> PathToGTF <-  "/home/public/ref/genome/mm/release_M25/gencode.vM25.annotation.gtf"

# Run PrepareBam function
> SG_RNASeq<-PrepareBam_EP(Samples=Samples,
+                          SamplePath=PathToSamples,
+                          Ref_Transc="GTF",
+                          fileTransc=PathToGTF,
+                          cores=40)
Preparing BAM files for EventPointer...
 Obtaining Bam Information
Done
 Obtaining Reference Transcriptome...Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK
Done
 Predicting Features from BAMs...Warning messages:
1: In .get_cds_IDX(mcols0$type, mcols0$phase) :
  The "phase" metadata column contains non-NA values for features of type stop_codon. This information was ignored.
2: In valid.GenomicRanges.seqinfo(x, suggest.trim = TRUE) :
  GRanges object contains 2 out-of-bound ranges located on sequence ERCC-00004. Note that ranges located
  on a sequence whose length is unknown (NA) or on a circular sequence are not considered out-of-bound
  (use seqlengths() and isCircular() to get the lengths and circularity flags of the underlying
  sequences). You can use trim() to trim these ranges. See ?`trim,GenomicRanges-method` for more
  information.
3: In valid.GenomicRanges.seqinfo(x, suggest.trim = TRUE) :
  GRanges object contains 2 out-of-bound ranges located on sequence ERCC-00004. Note that ranges located
  on a sequence whose length is unknown (NA) or on a circular sequence are not considered out-of-bound
  (use seqlengths() and isCircular() to get the lengths and circularity flags of the underlying
  sequences). You can use trim() to trim these ranges. See ?`trim,GenomicRanges-method` for more
  information.

> TxtPath <- "eventpointer/"

> AllEvents_RNASeq<-EventDetection(SG_RNASeq,cores=80, Path=TxtPath)

  |                                                                                                                           |   0%Error in { : 
  task 15299 failed - "arguments imply differing number of rows: 2, 3"

This is my session

> sessionInfo()
R version 4.2.1 (2022-06-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 8

Matrix products: default
BLAS/LAPACK: /usr/lib64/libopenblasp-r0.3.12.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] EventPointer_3.6.0          Matrix_1.5-3                SGSeq_1.32.0                SummarizedExperiment_1.28.0 Biobase_2.58.0             
 [6] MatrixGenerics_1.10.0       matrixStats_0.62.0          Rsamtools_2.14.0            Biostrings_2.66.0           XVector_0.38.0             
[11] GenomicRanges_1.50.0        GenomeInfoDb_1.34.1         IRanges_2.32.0              S4Vectors_0.36.0            BiocGenerics_0.44.0        

loaded via a namespace (and not attached):
  [1] fgsea_1.24.0             colorspace_2.0-3         rjson_0.2.21             ellipsis_0.3.2           qvalue_2.30.0            rstudioapi_0.14         
  [7] listenv_0.9.0            MatrixModels_0.5-1       bit64_4.0.5              AnnotationDbi_1.60.0     prodlim_2019.11.13       fansi_1.0.3             
 [13] xml2_1.3.3               codetools_0.2-18         splines_4.2.1            tximport_1.26.0          doParallel_1.0.17        cachem_1.0.6            
 [19] speedglm_0.3-4           cobs_1.3-5               dbplyr_2.2.1             png_0.1-7                graph_1.76.0             compiler_4.2.1          
 [25] httr_1.4.4               assertthat_0.2.1         fastmap_1.1.0            limma_3.54.0             cli_3.4.1                quantreg_5.94           
 [31] prettyunits_1.1.1        tools_4.2.1              igraph_1.4.1             gtable_0.3.1             glue_1.6.2               GenomeInfoDbData_1.2.9  
 [37] reshape2_1.4.4           affxparser_1.70.0        dplyr_1.1.0              rappdirs_0.3.3           fastmatch_1.1-3          Rcpp_1.0.9              
 [43] vctrs_0.5.2              rhdf5filters_1.10.0      rtracklayer_1.58.0       iterators_1.0.14         stringr_1.4.1            globals_0.16.2          
 [49] lpSolve_5.6.18           lifecycle_1.0.3          restfulr_0.0.15          XML_3.99-0.12            future_1.31.0            zlibbioc_1.44.0         
 [55] MASS_7.3-57              scales_1.2.1             BSgenome_1.66.1          hms_1.1.2                parallel_4.2.1           RBGL_1.74.0             
 [61] rhdf5_2.42.0             SparseM_1.81             yaml_2.3.6               curl_4.3.3               memoise_2.0.1            ggplot2_3.4.0           
 [67] biomaRt_2.54.0           stringi_1.7.8            RSQLite_2.2.18           BiocIO_1.8.0             foreach_1.5.2            poibin_1.5              
 [73] GenomicFeatures_1.50.2   filelock_1.0.2           BiocParallel_1.32.0      lava_1.7.2.1             shape_1.4.6              rlang_1.0.6             
 [79] pkgconfig_2.0.3          bitops_1.0-7             lattice_0.20-45          Rhdf5lib_1.20.0          GenomicAlignments_1.34.0 cowplot_1.1.1           
 [85] bit_4.0.4                tidyselect_1.2.0         parallelly_1.34.0        plyr_1.8.7               magrittr_2.0.3           R6_2.5.1                
 [91] generics_0.1.3           nnls_1.4                 RUnit_0.4.32             DelayedArray_0.24.0      DBI_1.1.3                pillar_1.8.1            
 [97] survival_3.3-1           KEGGREST_1.38.0          abind_1.4-5              RCurl_1.98-1.9           tibble_3.1.8             future.apply_1.10.0     
[103] crayon_1.5.2             utf8_1.2.2               BiocFileCache_2.6.0      progress_1.2.2           grid_4.2.1               data.table_1.14.4       
[109] blob_1.2.3               digest_0.6.30            munsell_0.5.0            glmnet_4.1-6            
JFerrer-B commented 1 year ago

Hi Huang,

Just to be sure you have some requirements to apply these functions: Do you have the corresponding .bai files for every .bam file? Do The BAM files include the XS-flag?

Juan

huang-sh commented 1 year ago

Hello, I am sorry that BAM files don't include the XS-flag. I have noticed it in the Overview of RNA-Seq Note. And thank your help!