Closed LiShuhang-gif closed 3 years ago
For whole human genome sequencing, we usually do it "with" repeat masking. That has worked fine in several published papers. So that's what I'd recommend, really.
For best possible accuracy/sensitivity, it's better to do it without repeat masking. But that uses much more time and memory.
For a smaller genome (e.g. bacterial) I'd do it without masking.
For best possible accuracy/sensitivity, it's better to do it without repeat masking. But that uses much more time and memory.
I have a query whose genome is 20G, and repeat annotation is still running. Can I do pairwise genome alignment using unmasked genome? It seems workable, although with more time and memory.
Pairwise genome alignment is a bit different from aligning long reads (in the preceding comments).
The preceding comments are also a bit out of date. Now I might suggest -uRY4
instead of masking, see:
https://www.biorxiv.org/content/10.1101/2022.05.30.494079v1
You can surely do unmasked pairwise genome alignment, if you use an option such as -uRY
to reduce the run time and memory use. If you don't use such an option, it might or might not be feasible: it depends on how big the other genome is, how closely-related, and how repetitive.
Thanks, let me give it a try
Hi, I was trying to run tandem_genotypes to detect tandem repeats on my ONT data. But I have some questions when preparing a genome. I see there are two options in this step —— prepare a genome with or without repeat-masking. If I care more about effect and accuracy than running time, should I prepare a genome without repeat-masking? Or which option do you recommend? Thanks a lot.