jianhong / ATACseqQC

ATAC-seq Quality Control
https://jianhong.github.io/ATACseqQC/articles/ATACseqQC.html
23 stars 12 forks source link

readBamFile with bigFile=TRUE still returns 0 GAlignmentsList #60

Open sunta3iouxos opened 1 month ago

sunta3iouxos commented 1 month ago

Hi there, Thank youf or this tool, but I am still facing the same issues as a lot of time ago. My ATACseqQC version is 1.22.0 I am on windows 11 R=4.2.2 bioconductor=3.16.0

Am I missing something?

When I am using the bigFile = ALSE option the pipeline works:

 gal <- readBamFile(file.path("C:/bam/Bowtie2/",bamfiles[i]), tag=tags, which=which,
+                    asMates=TRUE, bigFile=F)
This is a big BAM file. Do you want to set bigFile=TRUE to save memory? (Y/n)? n
> 
> gal
GAlignmentsList object of length 4349701:
[[1]]
GAlignments object with 2 alignments and 18 metadata columns:
      seqnames strand       cigar    qwidth     start       end     width     njunc |                  qname      flag      mapq
         <Rle>  <Rle> <character> <integer> <integer> <integer> <integer> <integer> |            <character> <integer> <integer>
  [1]     chr1      +  1S48M2D52M       101 186863384 186863485       102         0 | A00620:409:HHJ5GDSXC..        99        44
  [2]     chr1      -        101M       101 186863467 186863567       101         0 | A00620:409:HHJ5GDSXC..       147        44
          isize                     seq                    qual     mrnm        AS          MD                    RG        XG
      <integer>          <DNAStringSet>          <PhredQuality> <factor> <integer> <character>           <character> <integer>
  [1]       185 NTGTGCATAC...AGTCTGTATG #FF::FFFFF...FFFFFFFFFF     chr1       182  48^TG12A39 A006200409_226916_S85         2
  [2]      -185 CTTTGTGTGA...TATGCATGTG :FFFFFFFFF...FFFFFFFFFF     chr1       202         101 A006200409_226916_S85         0
             NM        XM        XN        XO        XS        YS          YT
      <integer> <integer> <integer> <integer> <integer> <integer> <character>
  [1]         3         1         0         1        59       202          CP
  [2]         0         0         0         0        60       182          CP
  -------
  seqinfo: 66 sequences from an unspecified genome

[[2]]
GAlignments object with 2 alignments and 18 metadata columns:
      seqnames strand       cigar    qwidth     start       end     width     njunc |                  qname      flag      mapq
         <Rle>  <Rle> <character> <integer> <integer> <integer> <integer> <integer> |            <character> <integer> <integer>
  [1]     chr1      +        101M       101  24612984  24613084       101         0 | A00620:409:HHJ5GDSXC..      1123         1
  [2]     chr1      -        101M       101  24613014  24613114       101         0 | A00620:409:HHJ5GDSXC..      1171         1
          isize                     seq                    qual     mrnm        AS          MD                    RG        XG
      <integer>          <DNAStringSet>          <PhredQuality> <factor> <integer> <character>           <character> <integer>
  [1]       131 AGCTTGTAGG...TAGGGATAAT FFFFFFFFFF...FF::FFFF:F     chr1       202         101 A006200409_226916_S85         0
  [2]      -131 GATTTGCTTT...AGTGTACAGG :FFFFF:F:F...FFFFFFFFFF     chr1       202         101 A006200409_226916_S85         0
             NM        XM        XN        XO        XS        YS          YT
      <integer> <integer> <integer> <integer> <integer> <integer> <character>
  [1]         0         0         0         0       202       202          CP
  [2]         0         0         0         0       202       202          CP
  -------
  seqinfo: 66 sequences from an unspecified genome

[[3]]
GAlignments object with 2 alignments and 18 metadata columns:
      seqnames strand       cigar    qwidth     start       end     width     njunc |                  qname      flag      mapq
         <Rle>  <Rle> <character> <integer> <integer> <integer> <integer> <integer> |            <character> <integer> <integer>
  [1]     chr1      +        101M       101  72245863  72245963       101         0 | A00620:409:HHJ5GDSXC..        99        44
  [2]     chr1      -        101M       101  72245878  72245978       101         0 | A00620:409:HHJ5GDSXC..       147        44
          isize                     seq                    qual     mrnm        AS          MD                    RG        XG
      <integer>          <DNAStringSet>          <PhredQuality> <factor> <integer> <character>           <character> <integer>
  [1]       116 ATATGCAAGC...TCCCAACCCG FFFFFFFFFF...FFFFFFFFFF     chr1       202         101 A006200409_226916_S85         0
  [2]      -116 ACTCATACAC...TCCCTAGATC FFFFFFFFFF...FFFFFFFFFF     chr1       202         101 A006200409_226916_S85         0
             NM        XM        XN        XO        XS        YS          YT
      <integer> <integer> <integer> <integer> <integer> <integer> <character>
  [1]         0         0         0         0      <NA>       202          CP
  [2]         0         0         0         0      <NA>       202          CP
  -------
  seqinfo: 66 sequences from an unspecified genome

...
<4349698 more elements>

When I am using the bigFile=TRUE to speed the process and use less memory I get the following:

> gal <- readBamFile(file.path("C:/bam/Bowtie2/",bamfiles[i]), tag=tags, which=which,
+                    asMates=TRUE, bigFile=T)
> gal
GAlignmentsList object of length 0:
<0 elements>
jianhong commented 1 month ago

Yes, it is not read into memory until it is required.

sunta3iouxos commented 1 month ago

OK, I get the point. It is a very slow process though. Is there a way to activate R parallelisation?