mbhall88 / head_to_head_pipeline

Snakemake pipelines to run the analysis for the Illumina vs. Nanopore comparison.
GNU General Public License v3.0
5 stars 2 forks source link

Apply variants to loci is missing lots of variants #54

Closed mbhall88 closed 3 years ago

mbhall88 commented 3 years ago

The script I used to take the variants from the subsampled VCFs and apply them to the reference loci is not working as expected.

In #53 we are investigating a locus, IGR:466668:468334, that has 8 variants missed by pandora. None of these variants were in either the spare or dense PRGs.

On the (excellent) recommendation of @iqbal-lab I went to see what the allele frequency (AF) for these variants in the entire CRyPTIC VCF was to see if making the dense PRG more dense would likely include them.

It turns out that all 8 of them have (effectively) AF=1.0. That is, they're in all samples that make a call at that position.

After looking at the final subsampled lineage VCF, these 8 variants should all have been added into the final sparse and dense PRGs, but they appear in neither.

Looking at the logs, there should have been 31 (sparse) and 75 (dense) variants applied to this loci, but only one was applied in both PRGs.

My task for tomorrow is to debug the script and figure out how this has happened.

iqbal-lab commented 3 years ago

fwiw, singular of loci is locus