senzhaocode / SV_standard

Convert raw SV outputs of multiple callers (using either RNA-seq or RNA-seq data) to FuSViz input format
MIT License
0 stars 0 forks source link

SV_standard - aggregate and convert raw SV calls from mutiple samples

SV_standard is a perl script for aggregating SV calls from multiple samples and converting into an expected format for FuSViz input. It merges raw SV calls from a range of tools (Manta, Svaba, Delly and Lumpy for DNA-seq data; Dragen, STAR-fusion, Arriba, Fusioncatcher and deFuse for RNA-seq data) and convert them into a tab-separated values (TSV) format.

Quickstart

1. Prerequisites

2. Installation

3. Run an example to aggregate SVs called from DNA-seq data

perl SV_standard.pl --genome hg38 --type DNA --filter PASS \
    --anno anno \
    --input example/DNA/input \
    --output example/DNA/output

The example/DNA/input folder contains raw SVs VCF files called from one or several tools (e.g., Manta, Svaba, Delly and Lumpy) per sample. Users have to prepare for input files following the folder organization below:

example/DNA/input
           |--- T001 # sample name
              |--- Manta.vcf 
              |--- Svaba.vcf
           |--- T002 # sample name
              |--- Delly.vcf
           |--- T003 # sample name
              |--- Lumpy.vcf

NOTE: raw VCF files (the compressed and indexed ones using bgzip and tabix are recommended) should be named using the caller nomenclature. In terms of raw SVs called from Svaba, no SV types (e.g., BND, INV, DEL and DUP) are available. Before running SV_standard.pl, we provide an in-house R script (at folder script) to assign SV type to each call and convert original vcf file following FuSViz requirement. For the usage - Rscript script/svaba_svtype.R svaba_raw.vcf svaba_new.vcf

An example of example/DNA/output folder contains the results:

4. Run an example to aggregate SVs called from RNA-seq data

perl SV_standard.pl --genome hg38 --type RNA \
    --anno anno \
    --input example/RNA/input \
    --output RNA_output

The example/RNA/input folder contains raw SVs called from one or several tools (e.g., Dragen, deFuse, STAR-fusion, Arriba and Fusioncatcher) per sample. Users have to prepare for input files as the following organization below:

 example/RNA/input
            |--- T001 # sample name
               |--- Arriba.tsv
               |--- STAR-fusion.tsv
            |--- T002 # sample name
               |--- Dragen.txt
               |--- Fusioncatcher.txt

NOTE: raw input files should be a tab-separated format (TSV or TXT) file that is named using the caller nomenclature.

An example of example/RNA/output folder contains the results:

Contact

t.cytotoxic AT gmail.com