Open nh13 opened 7 years ago
I forgot to mention the context. I want to re-assembly a set of reads I know originate from the same haploid copy of the genome, and it's in a tandem repeat. All the reads should start/end around the same place, so it's a bit easier than assembly.
These 100 reads will be collapsed to one read. You will get a singleton contig, which will be ignored unless you tune parameters.
For cfDNA-like data, assembly may not work well.
That is actually what I want, a single contig at then end of the day. Think haploid variant calling across repeat regions with indel and mismatch errors. All reads would come from the same DNA molecule.
I am considering using this instead of consensus calling for duplex sequencing. In this case we have stutter due to PCR slippage across STRs.
Also, the introduction in the readme implied it would be suitable for re-assembly if short reads, even in runs of LOH. Would you mind sharing the tuning parameters you tube the parameters to output the single contig?
Your example is violating the basic assumption of assembly and won't happen in practice. You need to test on real data.
@lh3 challenge accepted, I'll send you a real world dataset where this can happen!
@lh3 I was wondering if you received the dataset of which I am speaking. I believe it would be a novel application of fermi-lite, where we aren't assembling a genome, but rather reconstructing a source molecule. You could see such applications as re-assembling reads from the same long-molecule (ex. 10x) or with novel sequencing preparations (ex. Duplex Sequencing) benefiting from proper assembly of reads from a single molecule.
@lh3 I was playing around with this tool but I couldn't get it to work on a "simple" case. I duplicated a read 100 times and would expect it to output the duplicated read. Any thoughts?