Closed mbhall88 closed 4 years ago
See discussion below https://github.com/mbhall88/head_to_head_pipeline/issues/39#issuecomment-673238649 and 4451812 for final report of bcftools and compass
There is one outstanding thing to resolve here, which is that two samples with good pacbio samples were considered to have failed QC:
After discussion with @iqbal-lab we will exclude mada_132 from the 30x nanopore coverage and add it into the analysis.
mada_101 requires further investigation
mada_101 seems to have failed to have a lineage assigned as there was some evidence of het calls in the compass VCF at the lineage defining SNP locations - as it failed filters there that relate to base support.
In total, there are 1142 heterozygous genotypes in the VCF (ignoring filters) and 0 when ignoring filters.
This is compared with 385, 305, 422, 254, and 253 for 5 randomly selected QC-passing samples (all had 0 hets when applying filters).
Some position-specific analysis:
lineage 4.1. mada_101 is het with 156 calls on alt and 110 on ref
lineage 4.1.2.1. mada_101 is het with 128 calls on ref and 131 on alt
lineage 4.4.1.1. mada_101 is het with 100 calls on alt and 111 on ref
lineage 4.1.2. mada_101 is het with 100 calls on ref and 101 on alt
lineage 4.4.1. mada_101 is het with 89 calls on alt and 76 on ref
lineage 4.4. mada_101 is het with 79 calls on alt and 127 on ref
At all other positions in the Comas SNP catalogue, the ALT was not present in this sample
I'm hesitant to include mada_101 given the het calls are quite strong at those lineage-defining positions
1. Hold on, I thought we decided to INclude mada_132?
We did. My wording above was a little confusing I admit. I am saying it is excluded from the cooverage threshold - i.e. it wont be failed based on its nanopore coverage. I have already added it into the report in 5e877f8. It has pretty good precision (0.99561404) for such low coverage!
2. Interesting that none of the 1142 hets in mada_101 passed filters. Doesn't that mean none of the het Lineage snps passed filters?
Yes. Hence, why no lineage was called. All of the lineage-defining SNPs that mada_101 had ALTs for were het calls.
Whilst we are looking at concordance of Nanopore variant calls to the COMPASS Illumina calls, it is also important to remember that COMPASS itself is not necessarily "truth".
As we have high-quality PacBio CCS assemblies for 8 samples, we can compare both the Illumina and Nanopore calls to the truth to get a sense of the trade-offs. To do this we will use
varifier
.