Step-mothur is a bioinformatics analysis pipeline used for initial quality control, denoising of highly-multiplexed amplicon sequencing (HMAS) data. Currently, only the pair-end Illumina data is supported.
The pipeline is built using nextflow, a workflow tool to help with processing multiple samples in parallel and allowing for highly modular, customizable, and scalable analysis of HMAS data.
By default, the pipeline runs the following workflow:
Copy the Github repository to a folder
git clone https://github.com/ncezid-biome/HMAS-QC-Pipeline2.git
We recommend using conda for all required dependencies. You can create a conda env with our provided yaml file. For that, you will run the following:
conda env create -n hmas -f bin/hmas.yaml
(or mamba env create -n hmas -f bin/hmas.yaml
for speed) conda activate hmas
nextflow run hmas2.nf -profile test
test_output
folder ./test_pipeline.sh
Test with your own data - Make sure to provide path for the 3 required parameters in nextflow.config file.
*_R{1,2}*.fastq.gz
pattern. primer CACGCATCATTTCGCAAAAGC AGTACGTTCGGCCTCTTTCAG OG0001079primerGroup1
Run the following:
nextflow run hmas2.nf
note: the alternative is to provide those 3 parameters at the command line, for example:
nextflow run hmas2.nf --reads YOUR_READS --outdir YOUR_OUTPUT --primer YOUR_PRIMER
nextflow.config file:
Feel free to update the file as necessary. But it is recommended to fill in the params.reads, params.outdir, params.primer
, update the CPU, memory, params.maxcutadapts
parameters based on your available hardware, and leave other parameters intact unless you have strong evidence to update them otherwise.
multiqc_config.yaml file in bin/ folder:
Feel free to update the file as necessary. This file controls the display in the MultiQC htmal report.
note:
>M03235:107:000000000-KPP6Y:1:1101:19825:4748=OG0001064primerGroup7=isolateD-3-M3235-23-014;size=551
This repository constitutes a work of the United States Government and is not subject to domestic copyright protection under 17 USC § 105. This repository is in the public domain within the United States, and copyright and related rights in the work worldwide are waived through the CC0 1.0 Universal public domain dedication. All contributions to this repository will be released under the CC0 dedication. By submitting a pull request you are agreeing to comply with this waiver of copyright interest.
This repository contains only non-sensitive, publicly available data and information. All material and community participation is covered by the Disclaimer and Code of Conduct. For more information about CDC's privacy policy, please visit http://www.cdc.gov/other/privacy.html.
Anyone is encouraged to contribute to the repository by forking and submitting a pull request. (If you are new to GitHub, you might start with a basic tutorial.) By contributing to this project, you grant a world-wide, royalty-free, perpetual, irrevocable, non-exclusive, transferable license to all users under the terms of the Apache Software License v2 or later.
All comments, messages, pull requests, and other submissions received through CDC including this GitHub page may be subject to applicable federal law, including but not limited to the Federal Records Act, and may be archived. Learn more at http://www.cdc.gov/other/privacy.html.
If you're interested in contributing, please read our CONTRIBUTING guide.
This repository is not a source of government records, but is a copy to increase collaboration and collaborative potential. All government records will be published through the CDC web site.