google / deepconsensus

DeepConsensus uses gap-aware sequence transformers to correct errors in Pacific Biosciences (PacBio) Circular Consensus Sequencing (CCS) data.
BSD 3-Clause "New" or "Revised" License
229 stars 36 forks source link

Strand of Fastq output #41

Closed gevro closed 2 years ago

gevro commented 2 years ago

Hi, Deepconsensus takes as one of its input a CCS BAM file.

When the CCS BAM file is already aligned to the genome, does deepconsensus reverse complement the sequence for reads that align to the reverse strand of the genome, such that the output FASTQ of that molecule will be in the original unaligned orientation?

From what I can see, I think this is not happening, which is ok, but just wanted to confirm. Because this means that inputting an unaligned CCS BAM versus an aligned CCS BAM will produce a different FASTQ output, whereby reads that aligned to the reverse strand will have the reverse orientation in the deepconsensus FASTQ output.

danielecook commented 2 years ago

The CCS BAM file is expected to be unaligned. I don't think we have looked at using aligned CCS BAMs before. Presumably, the subread_to_ccs.bam would also be derived from an aligned ccs.bam, in which case it might work and reverse-complement the sequences where appropriate, but I would advise performing alignment to a reference after DeepConsensus processing has taken place.

gevro commented 2 years ago

Ok thanks. I am now using picard RevertSam to transform the aligned CCS BAM to unaligned CCS BAM.