mbhall88 / head_to_head_pipeline

Snakemake pipelines to run the analysis for the Illumina vs. Nanopore comparison.
GNU General Public License v3.0
5 stars 2 forks source link

Compare variant calls to truth assemblies #42

Closed mbhall88 closed 4 years ago

mbhall88 commented 4 years ago

Whilst we are looking at concordance of Nanopore variant calls to the COMPASS Illumina calls, it is also important to remember that COMPASS itself is not necessarily "truth".

As we have high-quality PacBio CCS assemblies for 8 samples, we can compare both the Illumina and Nanopore calls to the truth to get a sense of the trade-offs. To do this we will use varifier.

mbhall88 commented 4 years ago

See discussion below https://github.com/mbhall88/head_to_head_pipeline/issues/39#issuecomment-673238649 and 4451812 for final report of bcftools and compass

mbhall88 commented 4 years ago

There is one outstanding thing to resolve here, which is that two samples with good pacbio samples were considered to have failed QC:

After discussion with @iqbal-lab we will exclude mada_132 from the 30x nanopore coverage and add it into the analysis.

mada_101 requires further investigation

mbhall88 commented 4 years ago

mada_101 seems to have failed to have a lineage assigned as there was some evidence of het calls in the compass VCF at the lineage defining SNP locations - as it failed filters there that relate to base support.

In total, there are 1142 heterozygous genotypes in the VCF (ignoring filters) and 0 when ignoring filters.

This is compared with 385, 305, 422, 254, and 253 for 5 randomly selected QC-passing samples (all had 0 hets when applying filters).

Some position-specific analysis:

Position 62657

lineage 4.1. mada_101 is het with 156 calls on alt and 110 on ref

Position 107794

lineage 4.1.2.1. mada_101 is het with 128 calls on ref and 131 on alt

Position 355181

lineage 4.4.1.1. mada_101 is het with 100 calls on alt and 111 on ref

Position 891756

lineage 4.1.2. mada_101 is het with 100 calls on ref and 101 on alt

Position 4151558

lineage 4.4.1. mada_101 is het with 89 calls on alt and 76 on ref

Position 4307886

lineage 4.4. mada_101 is het with 79 calls on alt and 127 on ref

At all other positions in the Comas SNP catalogue, the ALT was not present in this sample

Conclusion

I'm hesitant to include mada_101 given the het calls are quite strong at those lineage-defining positions

iqbal-lab commented 4 years ago
  1. Hold on, I thought we decided to INclude mada_132?
  2. Interesting that none of the 1142 hets in mada_101 passed filters. Doesn't that mean none of the het Lineage snps passed filters?
mbhall88 commented 4 years ago
1. Hold on, I thought we decided to INclude mada_132?

We did. My wording above was a little confusing I admit. I am saying it is excluded from the cooverage threshold - i.e. it wont be failed based on its nanopore coverage. I have already added it into the report in 5e877f8. It has pretty good precision (0.99561404) for such low coverage!

2. Interesting that none of the 1142 hets in mada_101 passed filters. Doesn't that mean none of the het Lineage snps passed filters?

Yes. Hence, why no lineage was called. All of the lineage-defining SNPs that mada_101 had ALTs for were het calls.