google / deepconsensus

DeepConsensus uses gap-aware sequence transformers to correct errors in Pacific Biosciences (PacBio) Circular Consensus Sequencing (CCS) data.
BSD 3-Clause "New" or "Revised" License
222 stars 37 forks source link

deepconsensus-0.2.0 complains that subreads_to_ccs.bam is not indexed #16

Closed jelber2 closed 2 years ago

jelber2 commented 2 years ago

Following the tutorial, https://github.com/google/deepconsensus/blob/f1413ee0802dd09fb5a4507983314935e32ab482/docs/quick_start.md?plain=1#L95 , deepconsensus-0.2.0 complains that it cannot find the index for subreads_to_ccs.bam . Sorting with samtools sort and then samtools index fixes the warning, but is it necessary?

MariaNattestad commented 2 years ago

Sorry we didn't answer this issue sooner. No, you do not need to sort and index any bam files, and in fact I would not recommend it because DeepConsensus assumes that the inputs are in the original order that ccs and actc output. Changing that order by sorting might cause some data to be skipped during processing. We'll take a look and see if it's possible to turn off that warning, which I'm guessing is from pysam that we use for parsing the bam files.

jelber2 commented 2 years ago

Ok thank you! I have not noticed anything with data being skipped, but maybe I was not looking carefully enough.