google / deepconsensus

DeepConsensus uses gap-aware sequence transformers to correct errors in Pacific Biosciences (PacBio) Circular Consensus Sequencing (CCS) data.
BSD 3-Clause "New" or "Revised" License
222 stars 37 forks source link

Workflow questions #5

Closed gevro closed 3 years ago

gevro commented 3 years ago

Hi, Thanks for a great tool. A few basic workflow questions:

1) Why do you use pbmm2 to align subreads to ccs, instead of just using the subreads BAM together with the CCS BAM files made by pbccs, and matching them using read names?

2) Can deepconsensus output BAM files instead of FASTQ?

AndrewCarroll commented 3 years ago

Hi @gevro

Thank you for your questions.

For #1 (pbmm2 for ccs mapping), we need the alignment of the subread to be performed relative to the CCS molecule. This is required so that insertions and deletions present in the subread relative to the final consensus can be accounted for, so that by the end of the molecule there aren't large differences in the relative positions between the subreads. We do expect there to be more efficient ways of performing this task, and improving this is one of the major areas of current investigation.

For #2 do you want BAM files in unmapped BAM format (not relative to any reference), or are do you want BAM files mapped to a supplied reference. If you want the former, we can consider expanding the output formats for DeepConsensus. If it's the latter, that would require more thought on our part.

For these early issues, I want to point out that DeepConsensus in its current form isn't very scalable outside of Google infrastructure. It may be useful for targeted regions, but full SMRT cells will not be tractable. We plan to improve scalability to make it more generally useful in subsequent releases.

Thank you, Andrew

gevro commented 3 years ago

Thanks. For #2, I meant the former. Most other PacBio tools work with unmapped BAM rather than fastq. So it would be good to stay consistent with that.

Also, is there a way to contact you/deepconsensus team via direct message with some other ideas?

AndrewCarroll commented 3 years ago

Hi @gevro,

Writing unmapped BAM is a very reasonable request. I'll see how much work it will be to accommodate that.

We would welcome thoughts on DeepConsensus. For now, you can email awcarroll@google.com, and I'll confer with the broader team on how to include them in the conversation.