xiezhq / ISEScan

A python pipeline to identify IS (Insertion Sequence) elements in genome and metagenome
Apache License 2.0
79 stars 17 forks source link

ISEScan install with bioconda install with docker

A python pipeline to identify IS (Insertion Sequence) elements in genome and metagenome

Table of Contents

Overview

ISEScan is a python pipeline to identify IS (Insertion Sequence) elements in genome. It includes an option to report either complete IS elements or both complete and partial IS elements. It might be a good idea to try reporting both complete and partial IS elements when it is used to identify the IS elements in the assemblies of metegenome. ISEScan reports both complete and partial IS elements by default.

ISEScan was developed using Python3. It 1) scans genome (or metagenome) in fasta format; 2) predicts/translates (using FragGeneScan) genome into proteome; 3) searches the pre-built pHMMs (profile Hidden Markov Models) of transposases (two files shipped with ISEScan; clusters.faa.hmm and clusters.single.faa) against the proteome and identifies the transposase gene in genome; 4) then extends the identified transposase gene into the complete IS (Insertion Sequence) elements based on the common characteristics shared by the known IS elements reported by literatures and database; 5) finally reports the identified IS elements in a few result files (e.g. a file containing a list of IS elements, a file containing sequences of IS elements in fasta format, an annotation file in GFF3 format).

Citation

Zhiqun Xie, Haixu Tang. ISEScan: automated identification of Insertion Sequence Elements in prokaryotic genomes. Bioinformatics, 2017, 33(21): 3340-3347.

Download: full text, SupplementaryMaterials.docx, SupplementaryMaterials.xlsx.

Contact

Zhiqun Xie: xiezhq@hotmail.com

Installation

ISEScan on linux

ISEScan was tested on Linux only and can be installed from Bioconda packages and source code. Install from Bioconda is recommended as it is the simplest way for non-experienced users.

ISEScan on mac

I have no idea about ISEScan on mac as I only fully tested it on Linux. If you cannot install ISEScan on mac from Bioconda, you can try installing ISEScan from source codes. For installing ISEScan from source codes, I knew there was an issue to compile FragGensScan on Mac but I once solved it. To solve the problem of running FragGeneScan on Mac, please modify two source files in FragGeneScan source codes: 1) open util_lib.c and comment out ‘#include ’ on line3; 2) open hmm_lib.c and comment out ‘‘#include ’ on line6 and replace values.h with limits.h on line4. The modified FragGeneScan can run on Mac and Linux without problem according to my test result.

Automated install by Bioconda (recommended!)

The steps below will install ISEScan package via bioconda to /apps/inst/miniconda3/. You can install ISEScan to other place by changing the default miniconda3 install path in step Install Miniconda3. Visit Bioconda recipe for ISEScan for more details (Thanks both pbasting and tseemann for making it available!).

If system reports isescan.py: command not found..., please add ISEScan package to your PATH (replace /apps/inst/miniconda3 in the command below with your conda install path):

export PATH=/apps/inst/miniconda3/bin/:$PATH

Then, try ISEScan again:

isescan.py --seqfile NC_012624.fna --output results --nthread 2

Manual install (install from source code)

Upgrade ISEScan to the latest version

Automated upgrade from Bioconda

The lastest version becomes available on Bioconda is in a few hours or days after it is released on https://github.com/xiezhq/ISEScan. You can run the command below to upgrade the existing ISEScan if the existing ISEScan was installed by Bioconda.

conda update isescan

Manual upgrade from existing ISEScan

By manual upgrade, you may get the lastest version immediately from https://github.com/xiezhq/ISEScan). It is quite easy to upgrade the existing ISEScan to the latest version: copy all .py files from the latest version to the ISEScan install directory.

Usage example

Let's try an example, NC_012624.fna.

Tips to run ISEScan efficiently:

How to run a set of genomes in a row

Sometimes, we want to run hundres of genomes in one line of command and then wait for all computing jobs to complete. Before doing it, we assume:

Re-run ISEScan without gene/protein prediction and HMMER searching

Release History