simroux / Inovirus

Set of scripts and data used to detect putative inovirus sequences and/or taxonomically classify them.
5 stars 2 forks source link

What are the principle for manual curation on treating tandem insertions separately? #8

Open TangShan99 opened 1 year ago

TangShan99 commented 1 year ago

Hi ,i have a doubt. Actually i encountered a problem that some tandem insertions appeared,which have been talked about in previous questions. Here in my results, many multiple pI next to each other , different predicted sequences start and end with different sites, so i can't well define which one is the true boundary, and i wonder if you can tell me more detail on the principle for manual curation treating tandem insertions separately?

simroux commented 1 year ago

Hi,

I don't know of any efficient approach that would resolve tandem insertions. One thing you can try to do is extract the whole region (i.e. the multiple tandem insertions), and get a dot plot (e.g. via a self-blast on NCBI). This may help you identify repeat regions that may signal boundaries of individual insertions. But that's assuming these repeats are still here and intact, while my feeling is that these tandem insertions often include some (partially) decayed prophages.

The other option is to use these regions for what they are, i.e. "hotspots" of inovirus insertions from which one can not robustly/easily identify individual genome units. You can still count the number of distinct pI to get an approximated number of inoviruses in the region, for instance, but the gene content of each individual inovirus genome is much harder to establish.

Hope that helps, and sorry to not have a real solution there ! Best, Simon