mcveanlab / mccortex

De novo genome assembly and multisample variant calling
https://github.com/mcveanlab/mccortex/wiki
MIT License
113 stars 25 forks source link

mccortex subgraph behaviour #87

Open izaak-coleman opened 5 years ago

izaak-coleman commented 5 years ago

I have a couple of questions about the behaviour of mccortex31 subgraph.

Firstly, does --dist d specify the radius or diameter? i.e will subgraph extend from the input sequence a distance of d to the left and right (--dist specifies radius) or a distance of d/2 to the left and right (--dist specifies diameter)?

Secondly, if the sequence passed to --seq is a single kmer (of len 31 in this case), and distance passed to --dist is d, if --dist specifies radius, is it the case that the maximum number of possible kmers that could be pulled from the original graph is4^d + 4^d + 1 (one 4^d term each for left and right extension, 1 for the original seed kmer) assuming each node in the de brujin graph has four edges?

Finally, if the sequence passed to --seq contains multiple uncontiguous kmers, will each of the kmers be extended a distance of d?

Kind regards, and thanks for any insight given! Izaak Coleman