xiezhq / ISEScan

A python pipeline to identify IS (Insertion Sequence) elements in genome and metagenome
Apache License 2.0
79 stars 17 forks source link

IS identified by ISFinder but not ISEScan #35

Closed moiradion closed 3 years ago

moiradion commented 3 years ago

Hi,

I'm trying to identify IS in Sanger sequences (~1500 bp sequence not annotated). I uploaded one of my sequences on ISFinder and it identified a IS66 sequence (e-value 4e-106). However, ISEScan does not find any IS in that same sequence. Is it because ISEScan does not perform well on shorter sequences? Is there a way to tweak the parameters to make it more sensitive? I want to find IS in more than 2000 Sanger sequences so ISFinder is not really an option.

Thanks for your help!

xiezhq commented 3 years ago

Did ISEScan predict the potential IS66 Tpase in your sequence? If ISEScan failed to predict any Tpase in your sequence, then no IS element will be found by ISEScan. If ISEScan succeed to predict Tapse in your sequence, I can try running ISEScan on your sequence to check what happen in your case.

ISEScan identify potential IS elements in two steps: 1) predict Tpase gene in the input sequence; 2) extend the Tpase gene to partial or full-length IS element and filter out potential false-positive prediction. The future version of ISEScan can identify IS elements based on the genes provided by user but this new ISEScan version is till under developing.

moiradion commented 3 years ago

Hi,

I'm not quite sure how to answer your question. How can I determine if ISEScan predicted the potential IS66 Tpase in my sequence? In what output file would that information be?

If it can help, here's my sequence:

NNNNNNNNNNNNNNNNNNNGGNNNNTCTAGTGANATCCATCATCGCATCCNGTGCGCCCGGNTTATCCCCGCTGGCGCGGGGAACTCTTACTGCTTGGTATGCGGAATCACACCCTGAACGGTTTATCCCCGCTGGCGCGGGGAACACTGAAGCATCAAACATTTGGTGGACCAAACGGACGGTTTATCCCCGCTGGCGCGGGGAACACTGTACGCGGCNAGTTTTAGCGACAGGTCATCCCGGTTTATCCCCGCTGGCGCGGGGAACACGGATCTGCCAGCGCCTCTGCGGGGCGGTAAACCGGTTTATCCCCGCTGGCGCGGGGAACACTACGCCAGCCACCTGCTTCGCCAGCCGTTCGGCGGTTTANCCCCGCTGGCGCGGGGAANACGTTATCNCCNATCGCGGCGAGATAAATAGATGCGGNTTANCCCCGCTGGCGCGGGGAACACCAAACTGGATATCACTGAATGCAGCCGCGTGGCGGTTTANNCNCGCTGGCGCGGGGAACACTTCAAGTCNGCGCGCATCAAAATCATTTAATTCGGTTTATCCCCGCTGGCGCGGGGAACACCCTCCCTNATTTTTTCATCNTNNANNNTCATGNNGTTTATNANNGCTGGCGCGGGGAACACCCTCCCTNATTTTTTCATCGTNNNAGNTCATGCGGTTTATNNTAAGCGTCGTCTGAGTACCGTCTGGNCCCGAATCCANGATCCTGTAACTTAANCACCNNNNNATTTTGCAGGTGGACACCTCATGCGCGCTAAAGAAAGACTTCCCCGGAAACACTATTCNACCTGAANTTCAAAATGGAACTGGTCAGGCTGNCTCTTGAAGAANAANGCAGTATTGNCGCNNTGGCCCGGAAACATGACGNCAATGATAACCTGCTCTTTAATGGATANNNTCTGGCAGNNTGANGGNGGGTCTGTCNGCCCCGAAAAAACTCATCGTCNCTTCCTGCCCNGATACNCGTGCNGCTTCAGGCGGGCNNNN

There are many N but it did not stop ISFinder from finding an IS66 around position 679-996.

Thanks for your help!


De : Zhiqun Xie @.> Envoyé : 10 mars 2021 19:59 À : xiezhq/ISEScan @.> Cc : Moïra Dion @.>; Author @.> Objet : Re: [xiezhq/ISEScan] IS identified by ISFinder but not ISEScan (#35)

[Externe UL*]

Did ISEScan predict the potential IS66 Tpase in your sequence? If ISEScan failed to predict any Tpase in your sequence, then no IS element will be found by ISEScan. If ISEScan succeed to predict Tapse in your sequence, I can try running ISEScan on your sequence to check what happen in your case.

ISEScan identify potential IS elements in two steps: 1) predict Tpase gene in the input sequence; 2) extend the Tpase gene to partial or full-length IS element and filter out potential false-positive prediction. The future version of ISEScan can identify IS elements based on the genes provided by user but this new ISEScan version is till under developing.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://can01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fxiezhq%2FISEScan%2Fissues%2F35%23issuecomment-796337831&data=04%7C01%7Cmoira.dion.1%40ulaval.ca%7C12ed6bfb7527490b45a008d8e4290eaa%7C56778bd56a3f4bd3a26593163e4d5bfe%7C1%7C0%7C637510212280435298%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=rKltTSkXDhG4Ba3bxgwA6rHAY4y%2F8QHhVoWcpkVjIYw%3D&reserved=0, or unsubscribehttps://can01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAETA2Z5AQYLLSAYH7VOTUL3TDABV3ANCNFSM4Y357XQA&data=04%7C01%7Cmoira.dion.1%40ulaval.ca%7C12ed6bfb7527490b45a008d8e4290eaa%7C56778bd56a3f4bd3a26593163e4d5bfe%7C1%7C0%7C637510212280445291%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=TUQMASgZu6IJk8OUXJQPpVsIq3DEI2MM0%2FWqRR8i4aE%3D&reserved=0.

*ATTENTION : L’émetteur de ce courriel est externe à l’Université Laval. Évitez de cliquer sur un hyperlien, d’ouvrir une pièce jointe ou de transmettre des informations si vous ne connaissez pas l’expéditeur du courriel. En cas de doute, contactez l’équipe de soutien informatique de votre unité ou @.***


xiezhq commented 3 years ago

You can check the predicted protein sequence in proteome/*..fa.faa. ISEScan predicted two genes/ORFs:

seq1_52730+ VRPXYPRWRGELLLLGMRNHTLNGLSPLARGTLSIKHLVDQTDGLSPLARGTLYAXSFSDRSSRFIPAGAGNTDLPAPLRGGKPVYPRWRGEHYASHLLRQPFGGLXPLGGEXVIXXRGEINRCXLXPLARGTPNWISLNAAGGGLXXLARGTLQVXAHQNHLIRFIPAGAGNTLPXFFIXXXHXVYXXWRGEPLPXFFIXXXHAVYXKRRLSTVWXRIXDPVT seq1_7931000- XPPEXAXVXGQEXTMSFFGXDRPXXXLPXXIIKEQVIIXVMFPGXXXNTXXFFKXQPDQFHFEXQXE

But none of these two predicted protein matched the Tpase models in ISEScan. The problem might be from the FragGeneScan (used by ISEScan) which didn't predicted the any Tpase peptide sequence in your sequence.

xiezhq commented 3 years ago

I also used your sequence to search isfinder, it matched the left part of IS66 (left IR + ORF1). The ORF1 is the accessory gene instead of Tpase gene in IS66.