kalmSveta / PrePH

Predict PanHandles
0 stars 1 forks source link

Fold.py #1

Open simomounir opened 3 years ago

simomounir commented 3 years ago

Hi,

I am trying to run PrePH software on some sequences I extract using genomic coordinates and bedtools's getfasta. I am able to predict panhandles on most sequences but when I have to deal with long sequences, there is a memory error that pops up.

Here is more information about the two sequences I am trying to compare for example:

genomic coordinates:

===> chr2 144520120 144521111 ===> chr2 144384081 144517977

As you might notice, first sequence is less than a 1000 nucleotides, while the second is over 133K nucleotides. Is this usually an issue? Am I missing something about the software and how to run PrePH with long sequences as input?

Thanks in advance.

Cheers.

kalmSveta commented 3 years ago

Hi! Yes, there is such a memory issue with very long sequences. Unfortunately, now you can't run PrePH with such long sequences. One option is to cut the long sequence into a set of intersected shorter sequences. You can do it manually. Ot there is a function split_overlap(seq, size, overlap) in MakeBedForVirusGenome.py script which allows splitting a sequence into overlapping ones.