Closed SamuelGreenrod closed 2 years ago
Hi Samuel,
Thanks for your interest in ISEScan.
Yes, the subgroups are determined by clusters of transposase. The CD-Hit was used to cluster amino acid sequences of transposases as the homology search of amino acid sequences is more sensitive and reliable than the homology search of nucleotide sequences.
Zhiqun Xie
Great, thank you!
Hi there,
How are the IS subgroups determined? I've run ISEScan on some genomes and blasted representative ISs from multiple subgroups against ISFinder and found they have the same top hit (E-value < 10^-4). This suggests that different subgroups may actually contain the same sequence so I'm confused how they have been separated.
Are the subgroups decided based on the transposase? In the source code you've mentioned they are determined with CD-Hit clustering but it doesn't say whether this is nucleotide sequences of the whole IS element or just the transposase amino acid sequence. Thanks.