google / deepconsensus

DeepConsensus uses gap-aware sequence transformers to correct errors in Pacific Biosciences (PacBio) Circular Consensus Sequencing (CCS) data.
BSD 3-Clause "New" or "Revised" License
222 stars 37 forks source link

Merging aligned fastq output with original ccs BAM #39

Closed gevro closed 2 years ago

gevro commented 2 years ago

Is there a recommended approach / best practices for merging aligned deepconsensus fastq output with the original ccs BAM file, so that aligned deepconsensus reads are re-associated with the tags from the original ccs BAM (eg, ec, np, sn, zm, etc tags)?

(This issue could also be fixed by making deepconsensus output as BAM instead of fastq. )

danielecook commented 2 years ago

One simple approach here would be to iterate through the FASTQ and the original BAM using pysam.

You can substitute the sequence from the FASTQ, but the PW and IPD tags will not necessarily align, because DeepConsensus can produce sequences that differ in length from the PW/IPD lengths.

gevro commented 2 years ago

Ok thanks. I may try GATK MergeBamAlignment too.