mcveanlab / mccortex

De novo genome assembly and multisample variant calling
https://github.com/mcveanlab/mccortex/wiki
MIT License
113 stars 25 forks source link

Link creation fails on minimal example? #74

Closed winni2k closed 5 years ago

winni2k commented 5 years ago

I have created a gist with the relevant input and log files.

When I run mccortex with kmer size 3 with an input fasta containing two reads of length 4, and I then thread the same reads through the resulting graph, I do not get any links in the links file. Is this a bug, or am I doing something wrong?

noporpoise commented 5 years ago

For a repeat of length L you need a read of length L+2 to pair input and output edges. The shortest repeat in a k=3 graph is 3bp, so you need a read of 5bp covering it to generate links across it. Adding a base at the beginning should generate four links:

>0
GCGTT
>1
CCGTA
winni2k commented 5 years ago

Thanks! That works.

A second question, since we're on this topic: Does mccortex annotate links in through bubbles or does mccortex only annotate through cycles? If not, why not?

noporpoise commented 5 years ago

Links are only added to tangles -- parts of the graph that collapse down then split out again. A simple bubble can be traversed without links if you try walking down all the options one at a time. I'll try to add some better documentation on links - sorry I don't have time right now. The paper on Linked de Bruijn graphs (or my thesis when I upload it) might be the best place to start.

winni2k commented 5 years ago

That's a really succinct description! Thank you so much.

On Tue, Oct 16, 2018 at 6:18 PM Isaac Turner notifications@github.com wrote:

Links are only added to tangles -- parts of the graph that collapse down then split out again. A simple bubble can be traversed without links if you try walking down all the options one at a time. I'll try to add some better documentation on links - sorry I don't have time right now. The paper on Linked de Bruijn graphs https://academic.oup.com/bioinformatics/article/34/15/2556/4938484 (or my thesis when I upload it) might be the best place to start.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/mcveanlab/mccortex/issues/74#issuecomment-430301565, or mute the thread https://github.com/notifications/unsubscribe-auth/AASnPrRavAIVRirGtzTNAf5rJnh84O3vks5ulgbVgaJpZM4XNJ4W .

-- Post-doctoral researcher School of Engineering Sciences in Chemistry, Biotechnology and Health Department of Gene Technology SciLifeLab KTH Royal Institute of Technology Stockholm, Sweden