gx-health / TAGET

MIT License
5 stars 3 forks source link

TAGET user manual

TAGET is a computational toolkit that provides a wide spectrum of tools for analyzing full-length transcriptome data. Based on its highly precise transcript alignment and junction prediction, TAGET enables accurate novel isoform, gene fusion detection, and expression quantification analyses

Environmental dependence

FAST RUN

python TransAnnot.py -f [fasta] -g [genome fasta] -o [output directory] -a[annot gtf] -p [process] --use_minimap2 [1] --use_hisat2 [hisat2 index]

or you can use

python TransAnnot.py -c TransAnnot.Config

Running time

The running time is about less than 1 hours with 8 core on a Linux server

software running

1.the config file contain environmental path of each software and the index file of the reference genome

  1. you can set the following parameters at the first time

    • the path of HISAT2/Minimap2
    • the index file of reference genome
    • reference genome(FASTA)、anotiation of transcript file default Ensemble(GTF)、process number
  2. After setting the base parameters,you can set the fasta file of the full length transcript and the output dictionary or you can use -c config and -f [fatsa] -o [output]

  3. The reads\transcript\gene expression can be caculated by the parameter of --tpm

running result

The output files contain the following files:

{sample_id}.annot.stat each coloumn:

TransAnnot.Config

TransAnnotMerge

We can use TransAnnotMerge to generate expression matrix of multi-samples

FAST RUN

extract isoform expression from fasta file: python fa2exp.py -f [fa] -i [prefix] -o [exp] -p [taget output dictionary]

python script.py input.config

This step needs to use exon.gtf file,which can be unzipped by using unzip exon.gtf.zip

Usage of TransAnnotMerge

python TranAnnotMerge -c MergeConfig -o outputdir -m [TPM/FLC/None]

#sample stat bed db
------- ---- --- --

TransAnnotMerge running

  1. extract isoform expression from fasta file:python fa2exp.py -f [fa] -o [exp]
  2. running TranAnnotMerge: python TranAnnotMerge -c MergeConfig -o outputdir -m TPM

TransAnnotMerge result file

the classfication of transcript

the classfication of exon

TAGET user manual

TAGET is a computational toolkit that provides a wide spectrum of tools for analyzing full-length transcriptome data. Based on its highly precise transcript alignment and junction prediction, TAGET enables accurate novel isoform, gene fusion detection, and expression quantification analyses

Environmental dependence

FAST RUN

python TransAnnot.py -f [fasta] -g [genome fasta] -o [output directory] -a[annot gtf] -p [process] --use_minimap2 [1] --use_hisat2 [hisat2 index]

or you can use

python TransAnnot.py -c TransAnnot.Config

Running time

The running time is about less than 1 hours with 8 core on a Linux server

software running

1.the config file contain environmental path of each software and the index file of the reference genome

  1. you can set the following parameters at the first time

    • the path of HISAT2/Minimap2
    • the index file of reference genome
    • reference genome(FASTA)、anotiation of transcript file default Ensemble(GTF)、process number
  2. After setting the base parameters,you can set the fasta file of the full length transcript and the output dictionary or you can use -c config and -f [fatsa] -o [output]

  3. The reads\transcript\gene expression can be caculated by the parameter of --tpm

running result

The output files contain the following files:

{sample_id}.annot.stat each coloumn:

TransAnnot.Config

TransAnnotMerge

We can use TransAnnotMerge to generate expression matrix of multi-samples

FAST RUN

extract isoform expression from fasta file: python fa2exp.py -f [fa] -i [prefix] -o [exp] -p [taget output dictionary]

python script.py input.config

This step needs to use exon.gtf file,which can be unzipped by using unzip exon.gtf.zip

Usage of TransAnnotMerge

python TranAnnotMerge -c MergeConfig -o outputdir -m [TPM/FLC/None]

#sample stat bed db
------- ---- --- --

TransAnnotMerge running

  1. extract isoform expression from fasta file:python fa2exp.py -f [fa] -o [exp]
  2. running TranAnnotMerge: python TranAnnotMerge -c MergeConfig -o outputdir -m TPM

TransAnnotMerge result file

the classfication of transcript

the classfication of exon

Demos

dependency

1 Fast run

python TransAnnot.py -c 759133C.Config python TransAnnot.py -c 759133N.Config running time:72 minutes outputs: the dictionary of 759133C 759133C.minimap2.bed 759133C.hisat2.bed 759133C.annot.bed 759133C.annot.stat 759133C.annot.db.pickle 759133C.annot.cluster.gene 759133C.annot.cluster.transcript 759133C.annot.cluster.reads 759133C.annot.junction 759133C.annot.multiAnno 759133C.anno.tmp.stat

the dictionary of 759133N 759133N.minimap2.bed 759133N.hisat2.bed 759133N.annot.bed 759133N.annot.stat 759133N.annot.db.pickle 759133N.annot.cluster.gene 759133N.annot.cluster.transcript 759133N.annot.cluster.reads 759133N.annot.junction 759133N.annot.multiAnno 759133N.anno.tmp.stat

2.TransAnnotMerge

python fa2exp.py -f 759133C.fa -i 759133C -o 759133C -p ./expression python fa2exp.py -f 759133N.fa -i 759133N -o 759133N -p ./expression python script.py input.config running time:35 minutes outputs 759133.reads.exp 759133.transcript.exp

3.DIU analysis python expression_V1.py -t 759133.transcript.exp -g 759133.gene.exp -o 759133 running time:2 minutes outputs: 759133_DIU.txt

4 gene fusion

Rscript TAGET_fusion_2-3_ajust.r -j Jin_fusion_select.py -e STAT_select.py -c chuli.py -l 759133C.minimap2.bed -s 759133C.hisat2.bed -a 759133C.fa.anno.tmp.stat -t hg38.gtf -f 759133C.fa -n 759133C -o ./output running time:18 minutes output 759133C.fusion