starskyzheng / panpop

Application of pan-genome for population
MIT License
96 stars 9 forks source link

About the population performance in your paper #68

Closed songbowang125 closed 3 months ago

songbowang125 commented 3 months ago

Hi, I have just read your Panpop paper at Nat Comm. It's a wonderful job while I have two questions on the population performance part.

  1. what is the detailed definition on 'missing genotypes' and how did you calculate it? As far as I know, merging-based methods output each mutation record along with its GT for each sample. For a specific variant, if one sample does not harbor it, the GT will be './.' or '0/0'. So, how to determine whether the missing rate of a variant exceeds 30%?

  2. when computing precision, recall and f1 values (e.g. fig 5d) using the 86 long-read samples, what is the ground-truth set?

starskyzheng commented 3 months ago

Q1: ./. means we don't know the genotype. Means that it could be 0/0 or 1/1, but we can not determined. However 0/0 means we know there is no mutation. The missing rate were calculated by the count of ./.. Q2: SVs before populations-merging were treated as true-dataset.

songbowang125 commented 3 months ago

Got it and many thanks for your answer.