mcfrith / tandem-genotypes

GNU General Public License v3.0
45 stars 7 forks source link

Sequence alignment takes too much time #14

Open LiShuhang-gif opened 3 years ago

LiShuhang-gif commented 3 years ago

Hi, we want to use tandem-genotypes to detect tandem repeats in genome-wide data. Following your instructions, I must first align my sequence using last before running tandem-genotypes. Unfortunately, last takes too long to align my sequence (fastq files, 72G in size) within the server's maximum time limit (120 hours). In order to run tandem-Genotypes smoothly, I was wondering if you have any recommended alternatives for sequences alignment instead of last. Looking forward to your reply. Thanks a lot!

mcfrith commented 3 years ago

I don't have a recommended alternative, but LAST should work fine for you. Some of our recent run times with LAST: ~15 hours for 76Gb data and ~24 hours for 112Gb (both with -P16 option). So it should be fast enough... We always use the "with repeat-masking" recipe for this kind of data, have you tried that?

LiShuhang-gif commented 3 years ago

Yes, I have tried the "with repeat-masking" recipe and other parameters to make last run faster. But for some reason, last is still too slow to finish within 120 hours for 72Gb data. I wonder if I use Minimap2 for alignment and then convert the resulting PAF file to MAF format, will it affect the results of tandem-genotypes? Thanks a lot!

mcfrith commented 3 years ago

Sorry for slow responses. The answer is: yes it will surely affect the results (if you can get it to work), I don't know how drastically.