Closed maximilianpress closed 2 years ago
Thank you for this feature request.
DeepConsensus is still at a proof-of-concept stage, but we are working on making it more scalable and easy to use outside of Google's internal infrastructure. I'll log this as a feature request in the meantime. Thanks!
Hi @maximilianpress , We've made a release in January that cleaned up a code quite a bit, and we also no longer use pbmm2. I'm going to close this issue now. But feel free to open another one if you encounter any issues with the latest release.
Behavior I expected
Accept FOFN input for subreads (
--input_subreads_unaligned=subreads.fofn
).Very frequently PacBio subread data is spread across directories and files according to flow cell and run architecture. This is the standard format in which PacBio reads are delivered to customers by service providers. The solution to this is the FOFN format (see for example PBMM2 documentation).
Behavior I observed
Command-line arguments are not documented. Unclear expected input of
--input_subreads_unaligned
; however instead I received an error that the FOFN file did not have a SAM header. Example workflow does show BAM input but does not otherwise describe inputs.Reprocessing, merging, and housekeeping related to transformations on very large BAM files is a notable overhead and makes deepconsensus less useful.
Background
I am working with a rather large dataset (several TB) that involves combining across multiple PB BAM files from different flow cells. Therefore I have used the commonly used FOFN (file of file names) format as input to the PBMM2 step. Accepting FOFN is standard for PB tools.
I got to the deepconsensus step itself, however, before observing that BAM only appears to be supported for the unaligned input subreads.
I am currently using a workaround of
pbmerge
from the PB toolkit to prepare a single unmapped BAM file from my subreads. This single BAM can then presumably be passed to deepconsensus.What would help
I suggest some options for addressing this issue, at various levels of effort:
quickstart.md
to reflect the requirement for a single subread input BAM, including apbmerge
step for multiple BAM files.quickstart.md
to reflect the requirement for a single subread input BAM, including apbmerge
step for multiple BAM files.