lh3 / fermi-lite

Standalone C library for assembling Illumina short reads in small regions
MIT License
72 stars 23 forks source link

Cannot assemble a simple example #9

Open standage opened 6 years ago

standage commented 6 years ago

Consider the following 8 reads.

>seq1
ATCCTGAGAATCAATCTGTGAAAATTATGTCTTGGGAGGAGGGGAAGGAAACCAAAAATTTTTAGAAAAGCTGGAACTCTTAGCTATCTAGAAGCAGGTC
>seq2
GGGAAGGAAACCAAAAATTTTTAGAAAAGCTGGAACTCTTAGCTATCTAGAAGCAGGTCTTGAATCTCACAGAATCGCAAAGGAAGAAAATCAGGGCCTA
>seq3
TTTAGAAAAGCTGGAACTCTTAGCTATCTAGAAGCAGGTCTTGAATATCACAGAATCGCAAAGGAAGAAAATCAGGGCCTACCTATCTAAATTTAAAATT
>seq4
GAAATTTTAAATTTAGATATGTAGGCCCTGATTTTCTTCCTTTGCGATTCTGTGATATTCAAGACCTGCTTCTAGATAGCTAAGAGTTCCAGCTTTTCTA
>seq5
TGAGAAAATTATGTCTTGGGAGGAGGGGAAGGAAACCAAAAATTTTTAGAAAAGCTGGAACTCTTAGCTATCTAGAAGCAGGTCTTGAATATCACAGAAT
>seq6
TGAAAATTATGTCTTGGGAGGAGGGGAAGGAAACCAAAAATTTTTAGAAAAGCTGGAACTCTTAGCTATCTAGAAGCAGGTCTTGAATATCACAGAATCG
>seq7
TTTTTAGAAAAGCTGGAACTCTTAGCTATCTAGAAGCAGGTCTTGAATATCACAGAATCGCAAAGGAAGAAAATCAGGGCCTACATATCTAAATTTAAAA
>seq8
ATAGCTAAGAGTTCCAGCTTTTCTAAAAATTTTTGGTTTCCTTCCCCTCCTCCCAAGACATAATTTTCACAGATTGATTCTCAGGATTGGCAATCATGCA

A quick multiple sequence alignment shows that there is very good consensus among these 8 reads for most of the alignment.

seq1            -------------atcctgagaatcaatctgtgaaaattatgtcttgggaggaggggaag
_R_seq8         tgcatgattgccaatcctgagaatcaatctgtgaaaattatgtcttgggaggaggggaag
seq5            -----------------------------tgagaaaattatgtcttgggaggaggggaag
seq6            -------------------------------tgaaaattatgtcttgggaggaggggaag
seq2            ------------------------------------------------------gggaag
seq3            ------------------------------------------------------------
_R_seq4         ------------------------------------------------------------
seq7            ------------------------------------------------------------

seq1            gaaaccaaaaatttttagaaaagctggaactcttagctatctagaagcaggtc-------
_R_seq8         gaaaccaaaaatttttagaaaagctggaactcttagctat--------------------
seq5            gaaaccaaaaatttttagaaaagctggaactcttagctatctagaagcaggtcttgaata
seq6            gaaaccaaaaatttttagaaaagctggaactcttagctatctagaagcaggtcttgaata
seq2            gaaaccaaaaatttttagaaaagctggaactcttagctatctagaagcaggtcttgaatc
seq3            -------------tttagaaaagctggaactcttagctatctagaagcaggtcttgaata
_R_seq4         ---------------tagaaaagctggaactcttagctatctagaagcaggtcttgaata
seq7            -----------tttttagaaaagctggaactcttagctatctagaagcaggtcttgaata
                               *************************                    

seq1            -------------------------------------------------------
_R_seq8         -------------------------------------------------------
seq5            tcacagaat----------------------------------------------
seq6            tcacagaatcg--------------------------------------------
seq2            tcacagaatcgcaaaggaagaaaatcagggccta---------------------
seq3            tcacagaatcgcaaaggaagaaaatcagggcctacctatctaaatttaaaatt--
_R_seq4         tcacagaatcgcaaaggaagaaaatcagggcctacatatctaaatttaaaatttc
seq7            tcacagaatcgcaaaggaagaaaatcagggcctacatatctaaatttaaaa----

However, I cannot get fml-asm to produce any assembly from these reads. I've tried relaxing parameters in various ways but with no success. Are there any parameter settings that will assemble these reads, or is this a particularly challenging case that can't easily be solved?

Thanks!