phoward42 / APP-16S-mice-study

Documentation for 16S data analysis of apple pomace fed mice study
0 stars 0 forks source link

Project overview

This project pulls from a 16S data set generated as a part of one of my thesis projects. Mice were fed one of four different diets (High fat, low fat, high fat + 4.5% apple pomace, and high fat + 9% apple pomace). Fecal pellets from each mouse in the study (44 in total) were collected and submitted for 16S sequenceing. One sample from each group was selected for the purposes of this class.


Workflow documentation

I initially created a series of scripts used to explore my data using the QIIME2 native tools. These can be seen in the "scripts/*" directory aside from the nf-ampliseq.sh script. Each one does as follows:

Though this was working for me, I wanted to plan ahead in creating a more automated process. That's when I came across the nf-core pipeline ampliseq.

Ampliseq GitHub
nf-core/ampliseq page

Ampliseq performs a large number of tasks from quality control to differential abundance analysis while only requiring your raw data, primers, and output directory as inputs.

In order to execute this pipeline I wrote two scripts, nf-ampliseq.sh and run.sh modeled after the scripts we wrote for class in week 6. The scripts do the following:

Assumed primers are the 16S 515f-806r Illumina paired-end primers


Re-running the workflow

The necessary file structures required to run the workflow include:

  1. Final-project (parent directory and running directory for all scripts) - I treated this as my $PWD (can have any name as long as it is used to run any scripts)
  2. ./data (subfolders = raw, meta)
  3. ./scripts (containing nf-ampliseq.sh)
  4. ./run (containing run.sh)
  5. ./results
  6. ./software (subfolders = containers, nfc-ampliseq)

Fastq.gz files need to be copied into $PWD/data/raw
metadataArranged-subset.tsv should be copied from my repository into $PWD/data/meta (ignore the manifest file, ampliseq does not require it)

If the user does not have nextflow version 2.13 (or higher installed) do so before running the pipeline. Separate containers can be downloaded into $PWD/software/containers or mine can be used by editing the NXF_SINGULARITY_CACHEDIR env variable so $USER = phoward42.

The workflow should be ready to run.

Submit run.sh as a batch job

This takes a while as some steps request a lot of core time, nf-ampliseq.sh alone requests 24hr to accomodate long queue times

The workflow should generate the following files in the $PWD: