This pipeline prepares CNVkit steps for somatic CNV calling with WGS/WES data (BAM) to run on a cluster.
_tumor
and _normal
, respectively.Example input files for the two samples T1
and T2
.
T1_normal.bai T1_normal.bam
T1_tumor.bai T1_tumor.bam
T2_normal.bai T2_normal.bam
T2_tumor.bai T2_tumor.bam
In this step, we will prepare the pipeline, set the parameters and copy the input files using the below substeps.
parameters.ini
.sh prepare_CNVkit.sh
.input
directory. # Clone repository
git clone https://github.com/sagarutturkar/CNVkit_pipeline.git
# Prepare Pipeline
cd CNVkit_pipeline-master
dos2unix prepare_CNVkit.sh.txt
sh prepare_CNVkit.sh.txt
# Copy or link input BAM
cd input
scp <PATH>/T1_normal* ./
scp <PATH>/T1_tumor* ./
Matched normal for every sample is not required but make sure to include sufficient number of normal samples. See CNVkit documentation for details.
In this step we will prepare the submission files for batch and individual sample processing and submit the jobs to the cluster using the below substeps.
Run the perl script as perl 1_CNV.pl
to achieve following.
submission_files
according to parameters.ini
.CNVkit.sub
has commands to process data in batch mode for all samples.sh step_1.sh
.Run the individual sample commands only after successful completion of job CNVkit.sub
.
sh step_2.sh
.cd CNVkit_pipeline
perl 1_CNV.pl
cd submission_files
sh step_1.sh
#After successful completion of job CNVkit.sub
sh step_2.sh
This step completes all the steps for CNVkit and generates separate CNV calls for each individual samples.
This script will determine the intersection of ratio and segment affected genes and then only selects the subset of genes having copy number gain or loss by both methods.
# This step has already been incarporated in the previous submission file.
# Resulting file - *_trusted_* is available for each sample.
This table will read the combined list of Trusted genes from all samples and prepare the contingency table of genes by samples. The table is sorted by the gene that is altered in most number of samples.
cd output
find `pwd` -name "*trusted*" | xargs -I {} sh -c " tail -n+2 {} | cat" > All_sample_cn.txt
python ../lib/scripts/make_ctable.py -infile All_sample_cn.txt -outfile Ctable.txt
A short description for each output file is provided. Please check this link for detailed information.
File Name | File Description |
---|---|
*_normal.antitargetcoverage.cnn | bin-level anticoverage file Normal sample |
*_normal.targetcoverage.cnn | bin-level covarge file for normal sample |
*_tumor.antitargetcoverage.cnn | bin-level anticoverage file Tumor sample |
*_tumor.targetcoverage.cnn | bin-level covarge file for Tumor sample |
*_tumor.cnr | Bin-level log2 ratios by Sample |
*_tumor.cns | Segmented log2 ratios by Sample |
reference.cnn | Copy number reference profile (All Normal Samples) |
heatmap.png | Chromosme level copy number heatmap for multiple samples |
File Name | File Description |
---|---|
*_Results.xlsx | Combined result files in excel format |
*_diagram.png | Copy number shown on each chromosome as an ideogram |
*_scatter.png | bin-level log2 coverages and segmentation calls plotted by chromosme |
*_genebreaks.txt | List the targeted genes in which a segmentation breakpoint occurs. |
*_genemetrics_with_ratio.txt | targeted genes with copy number gain or loss (by ratio) |
*_genemetrics_with_segment.txt | targeted genes with copy number gain or loss (by segment) |
*_ratio-genes.txt | genelist (by ratio) - |
*_segment-genes.txt | genelist (by segment) |
*_trusted_genes.txt | genelist and cn (affected by both ratio and segment) |
*_tumor.call.filtered.cns | Estimated absolute integer copy number for each segment |
*_tumor.segmetrics.cns | summary statistics of the residual bin-level log2 ratio estimates |
Copy Number Call | Interpretation |
---|---|
0 | homozygous deletion (2-copy loss) |
1 | heterozygous deletion (1-copy loss) |
2 | normal diploid state |
3 | one copy gain |
4 | amplification (>= 2-copy gain) |
Altered in Samples
denotes the total number of samples with gain/loss.Gene_ID | T1 | T10 | T2 | T3 | T4 | T6 | T7 | T8 | T9 | Altered in Samples |
---|---|---|---|---|---|---|---|---|---|---|
CSMD3 | 3 | 3 | 3 | 4 | 3 | 3 | 6 | |||
ENSCAFG00000038475 | 3 | 3 | 3 | 4 | 3 | 3 | 6 | |||
ENSCAFG00000036360 | 3 | 3 | 3 | 4 | 3 | 3 | 6 | |||
SLC30A8 | 3 | 3 | 3 | 4 | 3 | 3 | 6 | |||
ENSCAFG00000032793 | 3 | 3 | 3 | 4 | 3 | 3 | 6 | |||
ENSCAFG00000038488 | 3 | 3 | 3 | 4 | 3 | 3 | 6 | |||
KCTD8 | 3 | 3 | 0 | 4 | 3 | 3 | 6 |