mikolmogorov / Ragout

Chromosome-level scaffolding using multiple references
Other
149 stars 27 forks source link

"N"s in the output files #75

Closed EllistonV closed 1 year ago

EllistonV commented 3 years ago

Hello,

I was trying to use ragout in some files that are SPAdes outputs. It worked, but I noticed that the scaffolds file filled up some spaces with "N". What does this mean? Does it mean that my target sequence has something there that the reference sequences don't, or that ragout did not have enough consensus from the reference sequences to know what to use to fill that area? An example of this problem: AAAGGCAGAGGACTTGCGGACGCGGGTTCGATTCCCGCCGCCTCCACC AATTCATTATCCGATACAGTCCAATACCGGGTCTTTCCCAAATACCTGAATCTTCTACAC ATCTTGTTTATTCCAAACAAACATGATCAAATCACCTCTTTTTGAGGTATGTATGGACTT AGCAGTTGAAGATACTACAGCATGGTCGGAAGCTATNNNNN...NNNCATAACTCGATTTCTTTCGAACCTTTTTGCTTGAACAA TGAGAACAGCGTGGTAAACGGTTATAGGTTAAATCC

Note that the "N" region is 103,000 bp long. I am using 18 reference sequences.

Thanks!

mikolmogorov commented 3 years ago

Hi,

Stretches of Ns represent unknown sequence. Ragout does not attempt to fill these gaps from reference sequences. It only orders/orients the input target sequences into scaffolds, but does not guess the missing sequence.

Hope this helps, Mikhail