1. Purpose

The cg/wgMLST allele calling pipeline will be used for calling alleles from genomic sequence data.

Note: This is an in-development description of this pipeline.

2. Input

2.1. Sequence data

The main input for this pipeline will be genomic sequence data. This will be in the form of either reads or assemblies. This will be provided to Nextflow via a --input samplesheet.csv file. The SampleSheet will be structured as follows:

sample	assembly	fastq_1	fastq_2
SampleA	/path/to/SampleA.fasta.gz
SampleB		/path/to/SampleB_1.fastq.gz	/path/to/SampleB_2.fastq.gz

2.2. MLST scheme

An MLST scheme will be provided, using the following parameters:

--mlst_scheme_name: The name of the scheme.
--mlst_scheme_data: Path to the data for the scheme.

3. Steps

The steps of this pipeline are to generate a (cg/wg)MLST profile from the input data.

4. Output

4.1. Tabular allele files

A table of all allele identifiers for every locus in the scheme will be provided.

sample	locus1	locus2	...
SampleA	5	10	...

4.2. JSON metadata

A JSON file output.json will be provided with all the allele calls structured in a way that they can be loaded by other systems (e.g., IRIDA Next). This will look like:

{
    "SampleA": {
        "listeria_cgmlst": {
            "locus1": 5,
            "locus2": 10,
        },
    },
    "SampleB": {
        "listeria_cgmlst": {
            "locus1": 1,
            "locus2": 10,
        },
    },
}

phac-nml / nf-pipelines

Add cg/wgMLST allele calling pipeline #3