katholt / sonneityping

Other
8 stars 2 forks source link

DOI

parse_mykrobe_predict.py

This script parses Mykrobe predict results for Shigella sonnei.

Mykrobe v0.9.0+ can identify input genomes as S. sonnei, assign those identified as S. sonnei to hierarchical genotypes based on detection of single nucleotide variants (SNVs; defined in the file alleles.txt), and report known mutations in the quinolone-resistance determining region (QRDR) of genes gyrA (S83L, D87G, D87Y) and parC (S80I).

Details of the genotyping scheme are available in the paper Hawkey et al, 2021, Nature Communications. Proposals for new genotype definitions to be added to the scheme can be submitted as an Issue in this repository.

This script can be used to parse the resulting JSON files output by Mykrobe (one per genome), and tabulate the results in a single tab-delimited file (example below).

Dependencies for parser script

Usage

Install Mykrobe

First, install Mykrobe (v0.9.0+) as per the instructions on the Mykrobe github.

Once Mykrobe is installed, make sure you run the following two commands to ensure you have the most up-to-date panels for genotyping:

mykrobe panels update_metadata
mykrobe panels update_species all

You can check what version of the scheme is currently loaded in your Mykrobe installation via:

mykrobe panels describe

Run mykrobe predict on each genome (Illumina reads, nanopore reads, etc)

Example command (using Illumina reads, SRR3441855):

mykrobe predict --sample SRR3441855 --species sonnei --format json --out SRR3441855.json --seq SRR3441855_1.fastq.gz SRR3441855_2.fastq.gz

Parse Mykrobe output

Input

Output

Example command

python parse_mykrobe_predict.py --jsons mykrobe_results/*.json --alleles alleles.txt --prefix results_mykrobe_parsed

Example output

The output table will be named prefix_predictResults.tsv, and will be in tab-delimited format:

genome species final genotype name confidence num QRDR parC_S80I gyrA_S83L gyrA_S83A gyrA_D87G gyrA_D87N gyrA_D87Y lowest support for genotype marker poorly supported markers max support for additional markers additional markers node support
sampleA S.sonnei 3.6.1.1 CipR strong 3 1 1 0 1 0 0 lineage3 (1; 97/0); lineage3.6 (1; 120/0); lineage3.6.1 (1; 91/0); lineage3.6.1.1 (1; 96/0)
sampleB S.sonnei 3.6.1.1 CipR strong 3 1 1 0 1 0 0 0.009 lineage3.7.20 (0.5; 1/105) lineage3 (1; 95/0); lineage3.6 (1; 112/0); lineage3.6.1 (1; 89/0); lineage3.6.1.1 (1; 111/1)
sampleC S.sonnei 3.6.1.1.2 CipR.MSM5 moderate 3 1 1 0 1 0 0 0.792 lineage3.6.1.1.2 (0.5; 84/22) lineage3 (1; 113/0); lineage3.6 (1; 138/0); lineage3.6.1 (1; 100/0); lineage3.6.1.1 (1; 131/0); lineage3.6.1.1.2 (0.5; 84/22)

Explanation of columns in the output:

Unexpected results

As recombination is extremely rare in S. sonnei, it is unlikely that DNA isolated from a pure culture would have high quality SNVs that are compatible with multiple genotypes, or that are incompatible with the defined hierarchical scheme of Hawkey et al, 2021. Therefore if you see anything reported in the "additional markers" column, it is worth investigating further whether you have contaminated sequence data (which would produce low-quality calls with low read support at additional markers) or a genuine recombination (with would produce high-quality calls with high read support at additional markers). In such cases it may also be illuminating to investigate the Mykrobe output file for further information, details of the JSON output can be found in the Mykrobe documentation here.