jianhong / ATACseqQC

ATAC-seq Quality Control
https://jianhong.github.io/ATACseqQC/articles/ATACseqQC.html
23 stars 12 forks source link

how do I read my shifted bam ad an obj in splitGalignmentsByCut #61

Open sunta3iouxos opened 1 month ago

sunta3iouxos commented 1 month ago

Hi again, so I have created and indexed the shifted bam files as in the tutorial.

 gal <- ATACseqQC::readBamFile(file.path(rootPath,bamfile), tag=tags, which=which, asMates=TRUE, bigFile=TRUE)
  shiftedBamfile <- file.path(outPath, paste0(gsub(".filtered.bam","",bamfile),"_shifted.bam"))
  gal1 <- ATACseqQC::shiftGAlignmentsList(gal, outbam = shiftedBamfile)

all bam files appear to have been properly generated and indexed. I would like now to read them to create the free/mono/di/trinucleosomes. In the manual it is stated that the obj in the splitGAlignmentsByCut function should be a "an object of GAlignments". So I am loading the bam with the GenomicAlignments::readGAlignments function. But I am getting the following error:

#loading nescessary stuff
library(TxDb.Mmusculus.UCSC.mm10.knownGene)
txs <- transcripts(TxDb.Mmusculus.UCSC.mm10.knownGene)
library(GenomicScores)
gsco <- getGScores("phastCons60way.UCSC.mm10")
library(BSgenome.Mmusculus.UCSC.mm10)
genome <- Mmusculus
#main script
gal1 <- GenomicAlignments::readGAlignments(file.path(outPath,bamfile))
> objs <- ATACseqQC::splitGAlignmentsByCut(gal1, txs=txs, genome=genome,
+                               conservation=gsco,outPath= outPath)
Error in ATACseqQC::splitGAlignmentsByCut(gal1, txs = txs, genome = genome,  : 
  length(names(obj)) == length(obj) is not TRUE

Any idea how to create a proper GAlignment object from the shifted bam file?

THANK YOU

jianhong commented 1 month ago

The error message means there is no name for the gal1. Try to gave a name for the element in the list.

sunta3iouxos commented 1 month ago

Thank you, where and how I am setting this name:

> names(gal1) <- "gal1"
Error in validObject(x) : invalid class "GAlignments" object: 1: 
    'x@NAMES' is not parallel to 'x'
invalid class "GAlignments" object: 2: 
    'names(x)' must be NULL or have the length of 'x'

just to note that when I load an rds file of the gal1 it gives me this: GAlignments object with 74129664 alignments and 18 metadata columns: while when I do the readGAlignments aproach I get:

GAlignments object with 74129664 alignments and 0 metadata columns:
Error in .Call2("vector_OR_factor_extract_ranges", x, start, width, PACKAGE = "S4Vectors") : 
  'end' must be <= 'length(x)'

might this be the error?

jianhong commented 1 month ago

Could you please show me the header of the metadata of your gal1? The names maybe saved in the metadata.

sunta3iouxos commented 4 weeks ago

Could you please let me know what I should provide, do you need the ouput of a specific command/function?

jianhong commented 4 weeks ago

try

head(mcols(gal1))
sunta3iouxos commented 2 weeks ago

I am sorry for the delay: Here is the output:

> head(mcols(gal2))
DataFrame with 6 rows and 0 columns

So, there is nothing there please also note the sise differences of gal1 (following the pipeline) and gal2 (just adding the bam via readGAlignments image and here is what I get when I follow the pipeline:

wrapped output ### correct uotput ```r > head(mcols(gal1)) DataFrame with 6 rows and 18 columns qname flag mapq isize seq A00620:409:HHJ5GDSXC:2:1101:1000:10175 A00620:409:HHJ5GDSXC.. 163 14 506 GGACATCAGA...AACCAACTCT A00620:409:HHJ5GDSXC:2:1101:1000:10175 A00620:409:HHJ5GDSXC.. 83 14 -506 GCCTCTCTGC...CCTGCCTCCC A00620:409:HHJ5GDSXC:2:1101:1000:10551 A00620:409:HHJ5GDSXC.. 147 44 -95 CTTTGCAGCC...CGCATCTCCC A00620:409:HHJ5GDSXC:2:1101:1000:10551 A00620:409:HHJ5GDSXC.. 99 44 95 TTGCAGCCTG...CATCTCCCTA A00620:409:HHJ5GDSXC:2:1101:1000:11678 A00620:409:HHJ5GDSXC.. 163 44 112 AGCGGGCACC...CAGCCCTGAG A00620:409:HHJ5GDSXC:2:1101:1000:11678 A00620:409:HHJ5GDSXC.. 83 44 -112 ACGGCACAAC...CAGCGCTACT qual mrnm AS RG XG A00620:409:HHJ5GDSXC:2:1101:1000:10175 ,,F,FF,FFF...FF,FFFFFFF chr11 186 A006200409_226916_S8.. 0 A00620:409:HHJ5GDSXC:2:1101:1000:10175 F::,:,:F:,...::FF::FF:F chr11 200 A006200409_226916_S8.. 0 A00620:409:HHJ5GDSXC:2:1101:1000:10551 FFFFFFFFFF...FFFFF:F:FF chr19 202 A006200409_226916_S8.. 0 A00620:409:HHJ5GDSXC:2:1101:1000:10551 FFFFFFFFFF...F:FFFFFFFF chr19 200 A006200409_226916_S8.. 0 A00620:409:HHJ5GDSXC:2:1101:1000:11678 FFFFFFFFFF...FFFFFFFFFF chr12 202 A006200409_226916_S8.. 0 A00620:409:HHJ5GDSXC:2:1101:1000:11678 FFFFFFFFFF...FFFFFFFFFF chr12 193 A006200409_226916_S8.. 0 NM XM XN XO XS YS YT A00620:409:HHJ5GDSXC:2:1101:1000:10175 3 3 0 0 NA 200 CP A00620:409:HHJ5GDSXC:2:1101:1000:10175 0 0 0 0 145 186 CP A00620:409:HHJ5GDSXC:2:1101:1000:10551 0 0 0 0 NA 200 CP A00620:409:HHJ5GDSXC:2:1101:1000:10551 0 0 0 0 NA 202 CP A00620:409:HHJ5GDSXC:2:1101:1000:11678 0 0 0 0 NA 193 CP A00620:409:HHJ5GDSXC:2:1101:1000:11678 1 1 0 0 NA 202 CP mpos A00620:409:HHJ5GDSXC:2:1101:1000:10175 60515936 A00620:409:HHJ5GDSXC:2:1101:1000:10175 60515526 A00620:409:HHJ5GDSXC:2:1101:1000:10551 12501830 A00620:409:HHJ5GDSXC:2:1101:1000:10551 12501828 A00620:409:HHJ5GDSXC:2:1101:1000:11678 100725067 A00620:409:HHJ5GDSXC:2:1101:1000:11678 100725051 ```
jianhong commented 2 weeks ago

Let's try to remove the conservation score first and rerun the script to see if you can repeat the error. If yes, please share me a subset of your gal.