pipeline_rnaseq_hisat2

Pipeline for processing paired-end RNA-sequencing using cgatcore and HISAT2.

Usage

Create a new repository from this one, using the Use as template button on GitHub.
- That way, your new repository starts its own commit history, where you can record your own changes!
- Only fork this repository if you wish to contribute updates to the template pipeline itself.
Clone the new repository to the computer where you wish to run the pipeline.
- The clone is the working directory for one run of the pipeline on one set of FASTQ files.
- To run the pipeline on another set of FASTQ file, go back to step 1, and create another repository from the template.
Create a Conda environment named pipeline_rnaseq_hisat2 using the file envs/pipeline.yml.
- You only need to do this once, no matter how many times you run the pipeline and how many copies of the pipeline you have cloned.
- In doubt, remove the existing environment and create it again from this file.
Create symbolic links to your input FASTQ files, in the subdirectory data/.
- Do not copy the files themselves, or make sure you don't commit them to Git (e.g. use .gitignore).
Edit the configuration of the pipeline as needed, in the file config.yml.
- Commit your changes to the configuration for version control and traceability.
Run the pipeline!
- On a High-Performance Computing (HPC) cluster, python pipeline.py make full -v 5, to use the Distributed Resource Management Application API (DRMAA).
- On a local machine python pipeline.py make full -v 5 --local.