Open Jesson-mark opened 3 years ago
We usually use tandem-genotypes with repeat-masking, and it usually works fine. Repeat-masking means that it excludes repeats when finding potential matches between reads and genome. After that, it finalizes the alignments between reads and genome: at this stage the masking is not applied, so the alignments should extend into the repeats just fine.
It's hard to say what's happening in your case: it may be nothing to do with repeat masking. Try visualizing the alignments around your TR of interest. (A typical problem is a TR which is longer than the reads: we can't handle that.)
I'm also not sure what you mean by "called": tandem-genotypes takes a tandem-repeat annotation file as input, and it can only analyze TRs that are jn that file.
Thanks for your prompt reply. I will try your suggestions.
What I mean "called" is that tandem-genotypes
can find(or analyze) a TR in a tandem-repeat annotation file. I used simpleRepeat.txt as annotation file and there is 1031708 TRs in it. The result file(tg.txt) of tandem-genotypes
have 688415 TRs which means nearly 1/3 TRs are not analyzed. Is it because those TRs are longer than the reads?
Not sure, but here's a couple of relevant tandem-genotypes options: -u BP, --min-unit=BP: ignore repeats with unit shorter than BP (default=2). -vv shows output for all repeats, including ones not covered by any DNA read.
Thanks for your considerate suggestions. I'll have a try.
Hi, I'm using
tandem-genotypes
to find tandem repeats(TR) from our human PacBio HiFi reads. I found a real TR which is not successfully called bytandem-genotypes
. I used a repeat-masking genome to build index usinglastdb
. The parameters oflastdb
,last-train
andlastal
is same as this recipe suggested.I wonder if it is the repeat-masking genome that harms alignment so that many reads get a high mismap score. Could you give any suggestions on our problem? Besides, what is the difference of effect between repeat-masking and without repeat-masking on calling tandem repeats?
Thanks.