xiezhq / ISEScan

A python pipeline to identify IS (Insertion Sequence) elements in genome and metagenome
Apache License 2.0
79 stars 17 forks source link

Unexpected Insertion Sequence #29

Closed vappiah closed 3 years ago

vappiah commented 3 years ago

Hi, I am trying to find insertion squence in my Mycobacterium uclerans isolated. In other to test the ISEScan tool. I began by feeding the reference genome Agy99.gb to the tool. I got IS256, IS3, IS30,,ISAS1. But I was expecting IS2404 and IS2606 but they were absent. Am I doing something wrong or my organism is not suitable for your tool? Please advice

Below is the command I used isescan.py --nthread 36 Agy99.fasta proteome hmm

xiezhq commented 3 years ago

ISEScan outputs all identified IS element copies (output both complete and partial copies by default) and tries to assign a family name and cluster name to each identified IS copy. That means there is no specific IS element name assigned by ISEScan. If the specific IS element name is needed for an identified IS copy, we can use the DNA sequence of the identified IS copy to search the external IS element database e.g. ISfinder to get the specific IS element name and family name for the specific IS copy.

I think you already got the correct IS2404 and IS2606 copies in your genome, as IS2404 belongs to family ISAS1 and IS2606 belongs to family IS256. In the output of ISEScan for your Agy99 genome, the identified IS2404 and IS2606 copies were assigned to family ISAS1 and IS256. The family name and cluster name in the outputs of ISEScan do not make much sense as what ISEScan wants to do is to identify all complete and partial IS elements (DNA sequence, TIR, transposase, etc.) in a genome sequence, it does not do much to assign the family name or IS element name to each identified IS copy. Once you get the sequences of the IS copy (and other features such as TIR sequence, peptide sequence of transposase, sequence of transposase ORF), you can search this sequence in any database you like to get more details about the family/group/IS name from the search results returned by the specific database.

In your case, if you want to know more details about the IS element family and name of an identified IS element copy, you can copy the fasta sequence (in Agy99.fasta.is.fna) of the IS copy and paste it (only sequence without header line starting with '>') to the input form at https://isfinder.biotoul.fr/blast.php and submit the query by clicking button 'BLAST' on that search page. The returned result page will show you the specific IS element name/family/group for that IS copy you just searched.

As I know that, there is no common IS element naming system in the community and the IS family name and IS element name for the common IS element (the same DNA segment) may also be changed time and time, though https://isfinder.biotoul.fr is popular. That means you might get different name for the same IS copy (a DNA segment in genome sequence) identified by ISEScan when you search the sequence of the identified IS copy in the different database with the different naming system.

Hope it is clear.

Xie

vappiah commented 3 years ago

Thanks Xie.

Its clear now. I implement your suggestion and update you