Open simomounir opened 3 years ago
Hi! Yes, there is such a memory issue with very long sequences. Unfortunately, now you can't run PrePH with such long sequences. One option is to cut the long sequence into a set of intersected shorter sequences. You can do it manually. Ot there is a function split_overlap(seq, size, overlap) in MakeBedForVirusGenome.py script which allows splitting a sequence into overlapping ones.
Hi,
I am trying to run PrePH software on some sequences I extract using genomic coordinates and bedtools's getfasta. I am able to predict panhandles on most sequences but when I have to deal with long sequences, there is a memory error that pops up.
Here is more information about the two sequences I am trying to compare for example:
genomic coordinates:
===> chr2 144520120 144521111 ===> chr2 144384081 144517977
As you might notice, first sequence is less than a 1000 nucleotides, while the second is over 133K nucleotides. Is this usually an issue? Am I missing something about the software and how to run PrePH with long sequences as input?
Thanks in advance.
Cheers.