xiezhq / ISEScan

A python pipeline to identify IS (Insertion Sequence) elements in genome and metagenome
Apache License 2.0
79 stars 17 forks source link

ncopy4is? #28

Closed ewilbanks closed 2 years ago

ewilbanks commented 3 years ago

Hi folks,

I can't quite make sense of the ncopy4is parameter in the output. What is this metric derived from? For example, I have only one IS element that has an ncopy4is = 64. If ISEScan predicts the multiple copies, where are these located in the genome? If that's not what this represents, then can you help me understand better what it means?

Thanks, Lizzy

xiezhq commented 3 years ago

Hi Lizzy,

You raised a good question!

The ncopy4is parameter in the output is mainly for the internal use by ISEScan algorithm. In short, it is the number of copies (not necessary the identical copies, and sometimes much longer or shorter than the query Tpase ORF) of predicted Tpase ORF in the input genome sequence and it usually does not equal to the number of copies of the specific IS element (cluster) in the output file because some remenants (or partial IS element) of the IS elements are filtered out by ISEScan. You might find it helpful for you to understand what it means to read section 2.4 'Characterizing multiple copies of IS elements' and section 2.6 'Post-processing to eliminate potential false predictions in new IS elements and to define partial IS elements' in the ISEScan publication, https://academic.oup.com/bioinformatics/article/33/21/3340/3930124.

Hope it is clear for you.

Xie