The cg/wgMLST allele calling pipeline will be used for calling alleles from genomic sequence data.
Note: This is an in-development description of this pipeline.
2. Input
2.1. Sequence data
The main input for this pipeline will be genomic sequence data. This will be in the form of either reads or assemblies. This will be provided to Nextflow via a --input samplesheet.csv file. The SampleSheet will be structured as follows:
sample
assembly
fastq_1
fastq_2
SampleA
/path/to/SampleA.fasta.gz
SampleB
/path/to/SampleB_1.fastq.gz
/path/to/SampleB_2.fastq.gz
2.2. MLST scheme
An MLST scheme will be provided, using the following parameters:
--mlst_scheme_name: The name of the scheme.
--mlst_scheme_data: Path to the data for the scheme.
3. Steps
The steps of this pipeline are to generate a (cg/wg)MLST profile from the input data.
4. Output
4.1. Tabular allele files
A table of all allele identifiers for every locus in the scheme will be provided.
sample
locus1
locus2
...
SampleA
5
10
...
4.2. JSON metadata
A JSON file output.json will be provided with all the allele calls structured in a way that they can be loaded by other systems (e.g., IRIDA Next). This will look like:
1. Purpose
The cg/wgMLST allele calling pipeline will be used for calling alleles from genomic sequence data.
Note: This is an in-development description of this pipeline.
2. Input
2.1. Sequence data
The main input for this pipeline will be genomic sequence data. This will be in the form of either reads or assemblies. This will be provided to Nextflow via a
--input samplesheet.csv
file. The SampleSheet will be structured as follows:2.2. MLST scheme
An MLST scheme will be provided, using the following parameters:
--mlst_scheme_name
: The name of the scheme.--mlst_scheme_data
: Path to the data for the scheme.3. Steps
The steps of this pipeline are to generate a (cg/wg)MLST profile from the input data.
4. Output
4.1. Tabular allele files
A table of all allele identifiers for every locus in the scheme will be provided.
4.2. JSON metadata
A JSON file
output.json
will be provided with all the allele calls structured in a way that they can be loaded by other systems (e.g., IRIDA Next). This will look like: