TC-hunter searches for transgenic insertion sites in a host genome and returns figures and a report to support these findings.
There's two programs; TC_hunter and TC_hunter_BWA.
TC_hunter_BWA accepts raw pair end fastq files (from one or several samples) as input and performes BWA MEM alignment before searching for trasgenic insertion site.
Accepts one or several aligned BAM files (mapped to both host and transgenic sequence) as input. TC-hunter then identifies anchors and chimeric reads that maps to both host and transgenig sequence.
Clone the repository from Github and put it in your path (or add the direct path to config file)
$ git clone https://github.com/vborjesson/TC_hunter.git
$ export PATH="/home/yourPath/TC_hunter":$PATH
In order to run TC_hunter you need to have some programs installed. Here's three options on how you can do it:
Install required programs and tools using conda yml-file (prefered). Has been tested on Anconda3, Anaconda2 and Miniconda2
$ conda env create --file TC_hunter/Scripts/TC_hunter.yml
$ source activate TC_hunter_v1.0
Create your own conda environment
$ conda create -n TC_hunter R=3.5
$ source activate TC_hunter
$ conda install -c bioconda samtools=1.10
$ conda install -c bioconda nextflow=19.01.0
(only if runing TC_hunter_BWA) $ conda install -c bioconda bwa
$ conda install -c anaconda pandas
$ conda install -c conda-forge r-circlize
$ conda install -c r r-dplyr
$ conda install -c r r-data.table
Download manually
softwares
R 3.5 or higher
python 2.7
samtools 1.10 (works on other versions as well)
nextflow 19.01.0
bwa 0.7
R packages
circlize
dplyr
data.table
Download data
mkdir test_run
cd test_run
pip install gdown # If you don't already have it installed
gdown https://drive.google.com/drive/folders/1Y-iCNo71OVmf3QqJeFrukxUlQGDojSKx?usp=sharing
cp ../TC_hunter/Test_data/* .
Then run TC_hunter:
nextflow ../TC_hunter/TC_hunter.nf -c testrun.config --workingDir <realpath_to_test_run_dir> --tc_hunter_path <realpath_to_tchunter>
You should see TC_hunter running each process one after each other
When it's done check that you have an output_summary.html file.
In order to generate figures with construct information, you need to add this informtaion. Create a txt-file with gene info per line, separated by space. The info should be; 1) name, 2) start position and 3) end position.
e.g.
Amp 1 500
lyz 1000 1200
Gene3 2000 5000
Gene4 7000 7700
Create a configuration file from template.
$ cp TC_hunter/template/TC_hunter.config /path/to/WorkingDir
Add required information to config file
Argument | Usage | Description |
---|---|---|
WorkingDir | <Path/WorkingDir> | Path to your working directory (this is where the output html and figures will be) |
TC_hunter_path | <Path/TC_hunter> | Path to TC_hunter, only TC_hunter if it's in your $PATH |
Construct_file | <Path/construct.txt> | Path to your construct.txt file (See Create construct.txt file above) |
Construct_length | The length of your construct in numbers | |
Construct_name | The name of the construct, most match the name in the reference file, no space | |
bam | The path to the directory where you have your bam file or (if several sampes) bam files. | |
Reference | Path to the merged reference file including both host and construct genome. cat host_ref construct_ref > Jointref.fa |
e. g. example.config
Argument | Usage | Description |
---|---|---|
WorkingDir | <Path/WorkingDir> | Path to your working directory (this is where the output html and figures will be) |
TC_hunter_path | <Path/TC_hunter> | Path to TC_hunter, only TC_hunter if it's in your $PATH |
Construct_file | <Path/construct.txt> | Path to your construct.txt file (See Create construct.txt file above) |
Construct_length | Length in numbers of your construct that will be plotted | |
Construct_name | Name of the construct, most match the name in your reference file | |
sample | Path to directory where you have the fastq-files (needs to have the name 'R1' and 'R2') | |
folder | Path to directory containing one directory for each sample. The name of the samples will be the same as the directory names | |
host_ref | Path to host reference file | |
construct_ref | Path to construct reference file |
e. g. example.config
Before running, make sure you have a config file with all required information (see "Make Configuration file").
$ nextflow TC_hunter.nf -c <file.config> [-with-report <report name>]
Before running, make sure you have a config file with all required information (see "Make Configuration file").
$ nextflow TC_hunter_BWA.nf -c <file.config> [-with-report <report name>]
In order to get the IGV figures you need to have GUI available. If not; you can run IGV separately when TC-hunter is finished. Run one .bat file for each sample.
$ igv.sh -b <sample_name.bat>
TC-hunter finds insertion sites based on chimeric and discordant read pair.
TC_hunter reports each possible insertion site in an html file called output_summary.html
. The file contains 5 columns; 1) Ranking - best hit based on score is ranked first, second best second etc, 2) Score - Based on the number of chimeric and discordant read pairs supporting this insertion site, 3) Breakpoint host - Where in the host is this insertionsite located, 4) Breakpoint construct - Where in the construct is this insertion site located, 5) figures - three figures I) circular plot (see below), II) igv, III) igv more zoomed in.
For every predicted insertion site a circular figure is created. Red links, "lines" represent every discordant read pair supporting this event. Black links represent chimeric reads supporting this event. !