SraMongo
to download associated metadata for all Drosophila melanogaster samplesdrosSRA
to allow users to easily access the data and perform different types of queries. Finally, we provide genome browser visualization through the FlyBase JBRowse instance.This project uses a set snakemake workflows. The snakemake workflows run in a pre-built singularity container which has all of the required software.
To run the singularity container you need to have singulairty, mongoDB, and miniconda installed.
To create the running environment run.
$ conda env create --file environment.yaml
This project makes use of the following environmental variables.
export SLACK_SNAKEMAKE_BOT_TOKEN=<secret>
export ENTREZ_API_KEY=<secret>
export PROJECT_PATH=<absolute path to where folder is cloned>
export SINGULARITY_IMG=$PROJECT_PATH/singularity/drosSRA_workflow.sif
export SINGULARITY_BINDPATH=<list of paths that need mounted in the singularity container>
export SLURM_JOBID=<optional, used to make temp directories on /lscratch>
export TMPDIR=<optional, used to make temp directories>
For development I have some extra packages that are useful.
$ conda env update -n drosSRA_workflow --file environment_dev.yaml
It is also helpful to pip install the two python packages that are distributed as part of this project.
$ conda activate drosSRA_workflow
[drosSRA_workflow] $ pip install -e src/
[drosSRA_workflow] $ pip install -e biometa-app/
./Snakefile
This workflow runs SraMongo
and builds a list of all SRXs and thier SRRs
(./output/srx2srr.csv
).
NOTE 1: SraMongo
requires an environment variable $ENTREZ_API_KEY
to be set. This API key can be generated following these directions here.
NOTE 2: The majority of workflows only need the ./output/srx2srr.csv
they do not require the local mongoDB
. I am not able to have an always on instance of mongoDB
on our cluster, so workflows that are typically run on a the cluster (./prealn-wf/Snakefile
and ./rnaseq-wf/Snakefile
) do not need access.
./fastq-wf/Snakefile
This workflow downloads FASTQ data from the SRA and checks that the download was sucessful. It was initially designed as a subworkflow, but snakemake
was not running groups correctly with subworkflows. Currently, I just pull the rules from this workflow into ./prealn-wf/Snakefile
and ./rnaseq-wf/Snakefile
.
./prealn-wf/Snakefile
This workflow downloads FASTQs for all samples and generates various QC features describing each sample.
./library_strategy-wf/Snakefile
This workflow runs outlier detection on the RNA-Seq samples. It creates a golden set of RNA-Seq samples to proceed with.
./rnaseq-wf/Snakefile
This is the meat of the project. It provides all deliverables.
I am currently working on this section. I am hand normalizing samples using a web app to assist me ./biometa-app
. Once this is done I will build up the outlier detection for each discrete group.
I am in the middle of a major refactoring to simply the project. The following workflows are deprecated or broken.
./agg-rnaseq-wf
./aln-downstream-wf
./aln-wf
: This old alignment workflow../geo-wf
: This is the workflow to put together the data currently upload to geo. It included biological metadata processing../stranded-bigwig-wf
: This is the workflow used to generate the current aggregated track up on FlyBase../ovary-rnaseq-wf
: Pulled out ovary data for another project../testis-rnaseq-pe-wf
, ./testis-rnaseq-stranded-wf
, ./testis-rnaseq-wf
: I was working on annotating the testis transcriptome. This part of the project became too large for our current scope so has been moved to a separate project.