/ \ _ / | _ \/ | | | | | | | |/ ` | | | |) \ \ | || | || | (_| | || < ) | __\,|_,|_|| \\/ Quality Control for RNA-Seq
The Ohio State University Wexner Medical Center Version 1.1 7/11/2014
The first part of this workflow is QC Pack, a wrapper for three popular RNA-Seq quality control tools RNA-SeQC, RSeQC, and FastQC. QC Pack runs on one sample at a time. It requires an aligned BAM file, one or two raw FASTQ files, and a configuration file containing additional metadata. Much of this configuration information is optional. Sequencing date, sequencing lane, sample ID, and a study descriptor are all required.
The last part of the workflow summarizes the QC plots and metrics in a database. Users can navigate the data in an interactive, HTML format. Results can then be filtered, searched, and downloaded.
QuaCRS v1.1 (Released 7/11/14)
QuaCRS v1.0 (Released 5/29/14)
QC tools
Other dependencies
QC_Pack
folder into your working directory.which
command can be used to find the locations of the binary..fai
file.qc
folder from QC_Database
to the server html resources page (typically inside /var/www/ for an apache server).read
folder from QC_Database
to an easily-accessible (but not publicly-viewable) working folder for later data uploading.First, create a database with the default privileges. The name of the database will be configured in settings below. For more information on how to create databases, visit http://dev.mysql.com/doc/refman/5.0/en/creating-database.html
Open the config.py file in the 'constant' folder. Set the database credentials using the DB_HOST, DB_USER, DB_PASS, DB_PORT, and DB_NAME variables.
DB_PORT is optional
DB_HOST should be "localhost" if using a database setup locally.
In that same file, set the WEB_APP_PATH
variable to the location of the 'qc' folder. For example, if the 'qc' folder is in the root of the localhost folder, WEB_APP_PATH = /var/www/html/qc/
(make sure to include the trailing slash)
Open the config.php file in the qc/application/config
folder. In that file, change $config['base_url']
and $config['root']
if they are different from how the server was set up
$config['base_url']
: Base URL should be absolute, including the protocol. This is the url of the project folder, the same address used access the database with an Internet browser.
$config['root']
: Root should be absolute (make sure to include the trailing slash). This is the path to the web folder, which should be identical to WEB_APP_PATH
.
Next, open the database.php file in the same folder (application/config
). In that file, fill in the database authentication configurations. The variables $db['default']
index 'hostname', 'username', 'password', and 'database' need to be filled in (these fields should be identical to what was setup in the Read Program section).
For more information on setting up the database configuration, visit http://ellislab.com/codeigniter/user-guide/database/configuration.html
The QC wrapper is run 1 sample at a time with 1 configuration file as an argument. A sample configuration file is included with QC Pack (input.cfg).
FASTQ_FILE
Full file path to where the FASTQ file is located. If this sample is a paired end sequencing sample, supply a comma separated list of paths with no spaces.
BAM_FILE
Full file path where the aligned BAM file is located.
UNIQUE_ID
Sample Identification, or sample name, unique to this sample
STUDY
Name of a project with which the sample is associated.
DATE
Sequencing date. Can be another important date. Used to uniquely identify multiple runs of the same sample. Left blank only for combined samples
LN
Sequencing lane. Can be another important identifier. Used to uniquely identify different runs of the same sample. Left blank only for combined samples
RUN_DESCRIPTION
Used to identify samples that are combined from more than one sequencing run. If reads come from more than one run, the raw files and aligned files will contain reads from more than one lane and date. Such samples require the following additional considerations:
INDEX
Bar code sequence used for demultiplexing
RQS
RNA quality score
SEQUENCING_TYPE
PolyA, Exome, Transcriptome, Genome, etc.
FCN
Flowcell number (the number of times the sample has been sequenced)
Once the sample configuration files are complete, run the wrapper as follows:
$ bash qcpack.sh input.cfg
The program will output to the current working directory.
Under normal circumstances, qcpack can check for existing output and resume incomplete steps. If QC fails, it may be necessary to run qcpack with the option to force removal of temporary files and existing output. This is done by passing "force" as an additional argument:
$ bash qcpack.sh input.cfg force
Multiple samples may be processed in parallel, assuming the hardware will support it.
A QC run will create FastQC, RSeQC, and RNASeQC directories in the working directory if they do not already exist. Each of these will contain a directory for the individual sample with the associated QC output. Each sample will also have a unique QC table to be read by the database. This table is a compilation of many QC metrics to summarize in the graphical user interface. Once all samples are finished processing, they are ready to upload to the database.
Assumptions for example purposes:
$config['base_url']='http://localhost/qc/';
Step-by-step execution (for a single sample):
Execute the read program(do not copy the dollar sign): $ python read.py -i ~/Documents/qc_pack/result/<SAMPLE_QC_TABLE.CSV> -d '\t' create
-i
gives the program the path to the QC table (including trailing slash).
-d
sets the delimiter for those files. The default delimiter is ',' (comma).
Step-by-step execution (for multiple samples):
Execute the read program: $ python read.py -b ~/Documents/qc_pack/result/ -d '\t' create
-b
gives the program the path to the QC tables (including trailing slash).
-d
sets the delimiter for those files. The default delimiter is ',' (comma).
For more information on executing the Read program: $ python read.py --help
Upon completion, the program will output how many samples were successfully processed and how many samples failed.
If the program returns an error, make sure that the database is setup correctly and make sure that the qc tables are located in the directory specified. Before executing the program again, clear the database: $ python read.py clear
The 'users.py' script is used to control project permissions in a QuaCRS database. Its arguments are user [-u | --user ]
and study [-s | --study ]
, depending on which function is being run. There are 5 functions:
Create is used to add users to the database. It will prompt for a password upon creation of each user. Study may be supplied here, or added in the following step. This function supports comma-separated lists (no spaces) and can be re-run to change an existing user's password. For added security, passwords are hashed before being entered into the database.
Example:
$ python users.py -u user1,user2 create
Creates two new users
$ python users.py -u user1,user2 -s studyA,studyB create
Creates two new users and grants them both access to view studyA and studyB
Add is used to grant additional project permissions to existing users. It supports comma-separated lists (no spaces), requires a study, and optionally accepts users. Identifying specific users will add project permissions only to them. Supplying a study and no users will add view permission for the study to all existing users.
Example:
$ python users.py -u user1 -s studyA,studyB add
adds permission to view studyA and studyB for the existing user, user1
$ python users.py -s studyA add
adds permission to view studyA for all existing users in the database
Remove is used to delete permissions and users. It supports comma-separated lists (no spaces) and requires either users, a study, or optionally both. Supplying a user and no study will delete the user from the database and remove all their view permissions. Supplying a study and no user will remove view permission from all users for the given study. Supplying both will remove view permission for the specified project(s) from the specified user(s)
Example:
$ python users.py -u user1 -s studyA remove
removes view permission for studyA from user1
$ python users.py -s studyA remove
removes view permission for studyA from all users
Show is used to display the view permissions in the database. It shows all registered users and which projects they are able to access. It requires no arguments.
Example:
$ python users.py show
Clear is used to delete all permissions from the database and restore the default configuration. Upon completion, only the default user@password account will exist. It requires no arguments.
Example:
$ python users.py clear
As of 3/01/2016: