jengelmann / FastqPuri

fastq quality assessment and filtering tool
GNU General Public License v3.0
18 stars 11 forks source link

FastqPuri, an fq quality control and filter tool

Software and source code of FastqPuri. It creates quality reports of fastq files and filters them removing low quality reads, reads containing too many N's or contamination reads (unwanted rRNA reads, impurities coming from another organism, ...).

Installation

Clone the repository, or download the source. Make sure that your system supplies the following dependencies for FastqPuri.

NOTE: FastqPuri will work without the optional dependencies but will skip creating html reports if they are not available.

$ cmake -H. -Bbuild/ [-DRSCRIPT=/path/to/my/R/bin/Rscript] [-DCMAKE_INSTALL_PREFIX=/path/to/my/root] ... 
$ cd build 
$ make 
$ sudo make install  

When running cmake, there are some variables you can set using the option -D followed by the variable name. These variables are:

The executables will be created in the folder bin and installed in /usr/local/bin. R scripts will be installed in /usr/local/share/FastqPuri/R.

WARNING: do not move the executables that depend on R scripts, anywhere else, unless you also move the corresponding R scripts respecting the local folder structure.

Executables

An exemplar work flow could be:

Documentation of the code

A Doxygen documentation of the code is available:

Use a docker container to run FastqPuri

The file 'Dockerfile' documents the exact linux installation we used for testing. If you have a docker installation ready on your machine, you may want to use a docker container for easy installation and capsulated usage of FastqPuri. After cloning this project from github and change to its main directory, you may install a docker container as follows:

$ docker build -t fastqpuri .

This will create a container based on the debian linux distribution covering all dependencies including R and pandoc. As soon as such a container is installed, you can use it either interactively:

$ docker run -v $PWD:/tmp -it fastqpuri

or by running a pipeline implemented in an executable bash script:

$ docker run -v $PWD:/tmp fastqpuri ./pipeline.sh

Note that this call generates results in the docker container directory /tmp but also keeps them after closing the docker container locally where the container was started.

Instead of generating the docker container yourself with 'docker build', you can also pull a pre-built image from the docker hub as follows:

$ docker pull clottaz/fastqpuri

You can run such a pre-built image with 'docker run' by indicating the images as 'clottaz/fastqpuri'.

Use a singularity container to run FastqPuri

Alternativly, if you have singularity installed on your machine, you can call our docker container for FastqPuri as follows:

$ singularity shell --bind .:/tmp docker://clottaz/fastqpuri

This call opens a shell within the container. With --bind we mount the current directory also in the container. The syntax is as follows: --bind src:dest; src is the source path on the host and dest is the destination path in the container, i.e. where you would like to make the source path available in your container. Note that this destination path in your container should be an existing directory, the operation will fail if you do not create the directory first. Hence, when we call singularity shell like this, the working directory in the container is /tmp.

Alternatively, in order to execute a script from the current directory, call singularity as follows:

$ singularity run --bind .:/tmp docker://clottaz/fastqpuri /tmp/pipeline.sh

Note that /tmp/pipeline.sh relates to the call within the container. Thus, pipeline.sh is located in the directory where singularity run is executed, but will be made available to the container via the --bind parameter.

If you want to invoke a function of FastqPuri, you can use the 'exec' command like so:

singularity exec docker://clottaz/fastqpuri Qreport -h

or invoke a script located in your home directory (assuming that run_ex_TREE.sh is located in your home directory):

$ singularity exec docker://clottaz/fastqpuri $HOME/run_ex_TREE.sh

Singularity documentation can be found here: https://www.sylabs.io/docs/

Installation via bioconda -under construction.

We are currently working on a bioconda environment for FastqPuri. If you follow the instructions below, it is quite likely that FastqPuri will not yet properly run from the bioconda environment. Sorry about that and please stay tuned!

Bioconda is a channel for the conda package manager specializing in bioinformatics software. Have a look at the reference:

To find out how to use bioconda, see https://bioconda.github.io. For installing FastqPuri in a bioconda environment, you have to install either miniconda or anaconda and register channels as follows:

$ conda config --add channels defaults
$ conda config --add channels bioconda
$ conda config --add channels conda-forge

Then you can install fastqpuri:

$ conda install fastqpuri

Actually, you may also want to use a specific environment for the sequencing quality control:

$ conda create -n qc fastqpuri

This call installs FastqPuri directly in a separate environment.

Contributors

Paula Pérez Rubio, Claudio Lottaz, Julia Engelmann

License

GPL v3 (see LICENSE.txt)