Closed pjm43 closed 2 years ago
Hi Jeff, I think the easiest solution might be to just mask these repeats. In case, analysis of repeats is also required, then maybe try something like -H -f 100 -q 2k --rmq=no --secondary=no
. But I too am guessing here, so probably it would be better if you also ask this question at the minimap2 repo for better advice.
I am also thinking how syri would handle this highly complex genome. It would be great if you could let me know whether it actually finishes and how long does it take (assuming that alignment would finish).
Hi Manish, Thanks for the quick response! I was worried that if I masked it might somehow be problematic for the downstream SyRI analyses.
Could I ask for a little additional help with the flags you suggested: -H -f 100 -q 2k --rmq=no --secondary=no
. Are these flags for minimap2?
I was worried that if I masked it might somehow be problematic for the downstream SyRI analyses.
Syri would not identify SRs in the masked regions and would instead output these masked regions as indels/not-aligned regions that would need to be filtered out, but other than that it should be OK.
Are these flags for minimap2?
Yes. Here is the documentation: https://lh3.github.io/minimap2/minimap2.html
When you say regions would need to be filtered out - how would I do that?
When you mask the genomes, you would get list of regions that are masked. After running syri, you would need to remove sequence variations (everything other than synteny, inversions, translocations, and duplications), that are overlapping these masked region.
Unfortunately, I do not have any code on how to do that exactly, so cannot help with that.
Thanks for the help!
Hi, Just a quick question regarding large (12 Gb), repetitive plant genome. I'm following your tutorial with
minimap2 -ax asm5
for the alignment, but the alignment is taking quite along time for just a single, albeit large (642 Mb) chromosome (still running after 3 days). I gave it 8 cpus each with 32 Gb mem. It's only using a single cpu (even through I flagged-t 8
) and is currently using 85 Gb of the memory allocated. Any ideas how to speed up the alignment process?Thanks in advance for any advice!