xiezhq / ISEScan

A python pipeline to identify IS (Insertion Sequence) elements in genome and metagenome
Apache License 2.0
79 stars 17 forks source link

Missing IS families? #44

Closed fgaudilliere closed 2 years ago

fgaudilliere commented 2 years ago

Hi there,

I used ISEScan on a very large dataset, and I was wondering about the few IS families listed in ISFinder for which I do not detect any IS copy: ISH6, ISLre2, and Tn3. None of the sequences in the pHMM library seem to refer to any of these families, so I wanted to check with you whether you just classified IS belonging to these families differently.

(Sorry if any of these questions just stem from a misunderstanding on my part of how ISEScan works.)

Best, Flora

xiezhq commented 2 years ago

Hi Flora,

Question 1. Yes, you are right. ISH6 and ISLre2 copies might be classified as IS256. When I built pHMM for ISEScan and assigned the family name for each transposase cluster, there might be no ISH6 and ISLre2 family available for public searching in ISFinder. I assigned IS family name to each transposase cluster (IS cluster) by searching the amino acid sequence of the representative member of each transposase cluster against the public ISFinder database. For the details on how to cluster the transposases and build pHMMs, please refer to the ISEScan publication.

Question 2. Yes, ISEScan was not designed to detect the complicated composite mobile genetic elements such as Tn3.

Zhiqun Xie

fgaudilliere commented 2 years ago

Thank you for your quick answer!

I did a quick check to see if ISH6, ISLre2 and Tn3 family members were actually found in my dataset, and if so into which family they were sorted. I took sequences listed as ISH6, ISLre2 or Tn3 family members in the ISFinder database and blasted them against the sequences retrieved by ISEScan in my search. Turns out: 1) Some ISH6 family members match sequences sorted as 'new' in the ISEScan output. 2) Some ISLre2 family members match sequences sorted as ‘IS3’ in the ISEScan output. 3) Some Tn3 family members match sequences sorted as ‘IS91’ or ‘new’ in the ISEScan output.

(NB: for the ISLre2 and Tn3 search, I only retrieved about 20 sequences from those listed on ISFinder, so the results are not comprehensive, I only meant to have a quick overview.)

I attached my blastn outputs in case you want to take a look.

Best, Flora

ISH6_seqs_blastn_output.txt ISLre2_seqs_blastn_output.txt Tn3_seqs_blastn_output.txt

xiezhq commented 2 years ago

Your blast result is not surprising. The family assignment of the detected IS copy in ISEScan is based on the sequence similarity between the ISfinder family and the ISEScan IS copy. The family classification of ISfinder IS elements is not solely determined by the sequence similarity between the sequences of the same family members, it mainly determined by the biology and biochemistry mechanism of the activity of the transposase and the whole IS element.