xiezhq / ISEScan

A python pipeline to identify IS (Insertion Sequence) elements in genome and metagenome
Apache License 2.0
79 stars 17 forks source link

PacBio Assemblies stuck at "addNonORFcopy" #48

Open olimat17 opened 1 year ago

olimat17 commented 1 year ago

Hi! I am running ISEScan on a large set of genome assemblies (consisting primarily of assemblies from Illumina data and a couple PacBio genome assemblies). The tools runs great on the test data and my Illumina assemblies, but the PacBio assemblies seem to get stuck at the "addNonORFcopy" step. (On my server it took the other genomes <30 seconds to finish successfully after the HMM, but on the PacBio assemblies I kill the command after 30 minutes because there is no forward progress). For a little more context: We anticipate >1000 transposases and predict that it is likely that our observed IS Elements may be overlapping or nested. Thank you for any help you can provide! -O

xiezhq commented 1 year ago

Hi,

Sorry for the late reply.

ISEScan might count two overlapped or nested IS elements as one large IS elements, it depends on the boundaries of the predicted ORF.

ISEScan works on any sequence file in FASTA format, one or many sequences in one sequence file. Is there any special sequences in your PacBio assemblies?

Zhiqun Xie

olimat17 commented 1 year ago

Thank you for your reply. Update: I attempted to run the sequences through the tool over the weekend, and I killed the command after 7 hours stuck at the same "addNonORFcopy" step. The sequences are the same size as the E. coli genome in the paper on the tool, so I am not quite sure why it is getting stuck. The sequences were assembled using Flye, and we were able to use other tools (e.g., CheckM, Prokka) on the genomes with normal running times. What do you mean by special sequences? Thank you for your help.

xiezhq commented 1 year ago

Could you share the sequence file with the issue 'addNonORFcopy'? I need to reproduce the error reported with your sequence file. Without reproducing the error, it would be hard to figure out what is the underlining issue.

Xie

olimat17 commented 1 year ago

Hi Xie, Sorry for the delay in response here. I just sent an example sequence to your listed contact e-mail (xiezhq@hotmail.com). For me it doesn't ever give an error, it just gets stuck for hours and never finishes. I am not quite sure why. Thank you for your help! -Olivia

xiezhq commented 1 year ago

Hi Oliva,

I received the fasta file. Just narrowed down the issue to the first contig in the example sequence file. Will figure out the issue and provide you solution.

Best, Zhiqun Xie


From: olimat17 @.> Sent: Thursday, March 30, 2023 8:49 PM To: xiezhq/ISEScan @.> Cc: Zhiqun Xie @.>; Comment @.> Subject: Re: [xiezhq/ISEScan] PacBio Assemblies stuck at "addNonORFcopy" (Issue #48)

Hi Xie, Sorry for the delay in response here. I just sent an example sequence to your listed contact e-mail @.**@.>). For me it doesn't ever give an error, it just gets stuck for hours and never finishes. I am not quite sure why. Thank you for your help! -Olivia

— Reply to this email directly, view it on GitHubhttps://github.com/xiezhq/ISEScan/issues/48#issuecomment-1491176239, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFKGZMVZKLCW5HFEIWGJY23W6YZ4DANCNFSM6AAAAAAWFPVHPQ. You are receiving this because you commented.Message ID: @.***>

xiezhq commented 11 months ago

The internal algorithm produced the IS element candidates with large number of population each IS element candidate. This caused the huge computing cost when clustering candidates and picking the representative for each cluster. Need to change the internal algorithm to solve this issue in the future.

cifuj commented 2 months ago

Hi Xie, I have come across the same issue as Olivia. isescan.py has been stuck at the addNonORFcopy step for more than 1 hour. I also have identified more than 1000 transposes in the genome with other tools. Are there any updates that could fix this issue?

Best, Jero