tipputa / Circular-genome-visualizer

4 stars 1 forks source link

About

Circular visualizer for complete genomes

Input: GenBank files (.gb)

Usage:

python runAllProcess.py <output directory> <input directory (GenBank files)>

Requirements

PATH

You need to add the blast+/bin and the circos/bin to your PATH. Please check below.

echo $PATH
blastn -h
circos -h

Run

Run all processes

python runAllProcess.py <output directory> <input directory (GenBank files)>

Two required arguments are as follows:

  1. path to the output directory
  2. path to the directory containing GenBank files (.gb)

Run after the blastp processes

python runAfterBlastProcess.py <output directory> <input directory (GenBank files)>

Two required arguments are as follows:

  1. path to the output directory
  2. path to the directory containing GenBank files (.gb)

Run singularity blastp, qsub jobs (for NIG supercomputer system)

module load singularity
python runOnlyBlast.py <output directory> <input directory (GenBank files)> <bin_singularity directory>

Three required arguments are as follows:

  1. path to the output directory
  2. path to the directory containing GenBank files (.gb)
  3. path to the bin_singularity directory

Please run runAfterBlastProcesses.py explained above after finished all qsub jobs.

Run visualization

python runVisualize.py  <output directory> <configuration file> <option; key word for output; default:"test"> <option; the minimum number of genes in each cluster; default: 1> <option; sorting column name; default: None>

Five arguments are as follows:

  1. path to the output directory
  2. path to the configuration file. Please see below.
  3. optional: suffix for the output image file. Default is "test".
  4. optional: the minimum number of genes in each cluster for visualization. Default is 1.
  5. optional: the column name in the configuration file for sorting. Default is None.

Configuration file

This file will be outputed as "RingOrder_*_df.tsv" by runAllProcess.py and runAfterBlastProcess.py. Please see ./testResult/RingOrder_aligned_df.tsv. and ./testResult/changed_setting.tsv for examples.

AccNo Genome_size Strand Angle Deviation (Aligned) Deviation (Original) optional
NC_000915.1 1667867 0 0 53.311 47.061 ...
NC_014256.1 1673997 1 342 53.07 177.67 ...
... ... ... ... ... ... ...

You can edit the visualization result, such as the number of genomes and the ring order, by deleting / reordering rows in this file.

Utility

Create orthologous gene cluster table

python runCreateOrthologousTable.py <output directory> <input directory (GenBank files)>

Two required arguments are as follows:

  1. path to the output directory
  2. path to the directory containing GenBank files (.gb)

output files (clustering result)

Test run

Test data: 5 Helicobacter pylori genomes

all processes

It takes 10 minutes (BLASTP 8 min, other 2 min) on a standalone desktop server of 16GB memory.

cd
git clone git@github.com:tipputa/Circular-genome-visualizer.git
python ~/Circular-genome-visualizer/bin/runAllProcess.py ~/Circular-genome-visualizer/test/ ~/Circular-genome-visualizer/test/gb/

RunVisualize. Configuring the visualization.

In this example, "changed_setting.tsv" is a modified configuring file, where the first row was deleted from /test/data/RingOrder_aligned_df.tsv.

python ~/Circular-genome-visualizer/bin/runVisualize.py ~/Circular-genome-visualizer/test/ ~/Circular-genome-visualizer/test/changed_setting.tsv "rm1genome" 4

"rm1genome" is a suffix for the output file (e.g. circos_rm1genome.png). The genes conserved in >= 4 genomes are visualized.

Citation