Closed vagan21 closed 2 years ago
Hello,
Thank you for reaching out about the question. The format_reads
step is necessary before running duplicate_remover
. The pupose of this step is to merge entries for paired-end reads and also put all reads in a T-rich orientation.
When abismal produces the output, the paired-end reads are still in two separate entries, one for each read. The SAM fields rnext
and pnext
are used to indicate which reads are concordant pairs. The format_reads
program will merge them into a single entry, also removing fragments of the read that may overlap. The usual pipeline prior to downstream analysis is
format reads
-> samtools sort
-> duplicate-remover
after which the resulting SAM can be passed to other programs (bsrate
, methcounts
, etc)
that said, the more likely reason why the duplicate files are empty is because the reads may not have been sorted prior. duplicate-remover
assumes sorted reads so that duplicates will be adjacent in the SAM file.
I hope that helps!
Closing due to inactivity but if the problem persists feel free to re-open!
Hello, I created mapped reads in a sam file format using abismal with paired-end reads as input. Do we need to run the format_reads function on these sam files before proceeding to duplicate_remover? I'm a little confused about what is happening at the format_reads step, as it sounds like the merging of paired-end entries into single-end entries happens with abismal with paired-end reads as input. I didn't run format_reads on these sam files, proceeded to duplicate_remover, and my de-duplicated files are empty. Thanks and I hope my question makes sense. Verda