ibest / ARC

Assembly by Reduced Complexity (ARC)
Apache License 2.0
41 stars 5 forks source link

Improve repeat detection #37

Open samhunter opened 10 years ago

samhunter commented 10 years ago

In the MultiMite test, 9 target X sample combinations were flagged as hitting a repeat and further assembly was stopped at iteration 2. In actuality this occurred because a small number of reads were recruited on the first iteration followed by a large number on the second. In 8 of 9 cases a reduced number of contigs was produced on iteration 2 compared to 1, and in the 9th case the number was equal.

Based on these results: Set up a new criteria for repeat detection which includes num contigs. For example:

if NumReads > lastNumReads * multiplier AND NumContigs > lastNumContigs: isRepeat = True

This should guard against most cases of false repeat detection.