lh3 / minigraph

Sequence-to-graph mapper and graph generator
https://lh3.github.io/minigraph
MIT License
411 stars 39 forks source link

how are ranks determined #17

Closed SHuang-Broad closed 3 years ago

SHuang-Broad commented 3 years ago

Hello Heng,

I'm looking at the example 20-sample human rGFA you shared on the FTP site (ftp://ftp.dfci.harvard.edu/pub/hli/minigraph), and guessed that in general the rank of segments is the order in which the deriving-sample was added to the graph (ps. the readme is for 14 samples which I guess is slightly out of date).

However, I noticed that for samples NA12878, NA24385 and PG1, segments marked as derived from them (SN tag) have two ranks. For example, there are 4058 segments derived from NA12878 ranked as rank-4, and 2908 of rank-5 (no significant difference amongst their lengths, based on a quick glance).

So I'm wondering if this is desired and in general, where can we read more about the rank calculation.

Thanks! Steve

SHuang-Broad commented 3 years ago

Never mind. I see/guess there are two haplotypes used for that, hence the two ranks. Sorry for the noise.

lh3 commented 3 years ago

You can also look at the new HPP graphs at ftp://ftp.dfci.harvard.edu/pub/hli/minigraph/HPP/. 59 assemblies in the graph. More will be added later, probably before year end.

lh3 commented 3 years ago

The assembly quality of HPP samples is much higher than the assembly used in the minigraph paper.