Open AmayAgrawal opened 1 year ago
Suppose the MSA looks like xxxxxAxxxxxx xxxxxCxxxxx xxyyyyyyyyxx If there is very low coverage on the x's and lots on the y, you get forced onto the bottom path, and the A/C choice becomes irrelevant/ignored.
Hi, I have uploaded a zip folder at this drive link (https://nubes.helmholtz-berlin.de/s/R8SHBsT8yDmeca4) which contains all the necessary files required to regenerate the issue that I am talking about. This zip folder contains a 'README' file, which explains all the steps and files that are present in this zip folder.
Let me know if you have any more questions from my side
Omg we have not replied to you! So sorry @AmayAgrawal , we will return to this after the Xmas vacation
No worries. It would be nice if you can look at this now
Hi,
I am facing an issue regarding the reference path that pandora uses for genotyping the variants. It is basically using the less frequent supported path instead of most frequent supported path as a reference. Below I will try to explain it in a simple way:
Suppose I am using 100 strains for my analysis. First, I did the pan-geome analysis and use the MSA's to build the pan-genome reference graphs (PRG). Next, used these PRG's to genotype the variants in these 100 strains using pandora. Now suppose for a pan-genome graph of a particular loci (let's say gene A) at a particular position (let's say 300), we have 3 differents paths that are possible. Among these 3 paths, If I understand correctly, the path which is supported by majority strains out of 100 strains should be chosen as reference, but actually it was not the case. Due to this, suppose the SNP which I was looking for (let's say C 300 T), in which 'C' is ref and 'T' is alt allele, actually pandora chooses 'T' as ref and 'C' as alt allele. I saw in one of the issues that is currently open that Pandora heavily undermappes (#325). Can it the be the case that it is choosing less frequent path due to this or maybe I am understanding something incorrectly?