The script I used to take the variants from the subsampled VCFs and apply them to the reference loci is not working as expected.
In #53 we are investigating a locus, IGR:466668:468334, that has 8 variants missed by pandora. None of these variants were in either the spare or dense PRGs.
On the (excellent) recommendation of @iqbal-lab I went to see what the allele frequency (AF) for these variants in the entire CRyPTIC VCF was to see if making the dense PRG more dense would likely include them.
It turns out that all 8 of them have (effectively) AF=1.0. That is, they're in all samples that make a call at that position.
After looking at the final subsampled lineage VCF, these 8 variants should all have been added into the final sparse and dense PRGs, but they appear in neither.
Looking at the logs, there should have been 31 (sparse) and 75 (dense) variants applied to this loci, but only one was applied in both PRGs.
My task for tomorrow is to debug the script and figure out how this has happened.
The script I used to take the variants from the subsampled VCFs and apply them to the reference loci is not working as expected.
In #53 we are investigating a locus,
IGR:466668:468334
, that has 8 variants missed bypandora
. None of these variants were in either the spare or dense PRGs.On the (excellent) recommendation of @iqbal-lab I went to see what the allele frequency (AF) for these variants in the entire CRyPTIC VCF was to see if making the dense PRG more dense would likely include them.
It turns out that all 8 of them have (effectively)
AF=1.0
. That is, they're in all samples that make a call at that position.After looking at the final subsampled lineage VCF, these 8 variants should all have been added into the final sparse and dense PRGs, but they appear in neither.
Looking at the logs, there should have been 31 (sparse) and 75 (dense) variants applied to this loci, but only one was applied in both PRGs.
My task for tomorrow is to debug the script and figure out how this has happened.