Open gspirito opened 10 months ago
Many thanks for your interest in tandem-genotypes. What you're doing seems correct: I don't know why it doesn't work. Maybe if you could share your intermediate files...
To know which reads support the expansions, you can use tandem-genotypes option -v
.
Thank you very much for the answer, I attach the locus I used for the analysis, the result I got from Tandem-genotypes and the MAF file containing the reads mapping to that locus:
Thanks for this interesting example! In short, tandem-genotypes is "working as designed", but the design isn't looking good in this case.
It's faithfully following the "tandem-genotypes method" in here: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1667-6
This dotplot shows the alignment (red) of one read that supposedly has 3 additional copies of TGC:
To the left of the repeat (purple), there's an insertion and deletion almost adjacent to each other. tandem-genotypes is counting the insertion as a repeat expansion. It counts insertions that are slightly outside the repeat: we found it necessary to do that in general, because the precise boundaries of repeats can be fuzzy and ambiguous (for non-exact repeats).
You could use tandem-genotypes option -n20
(to only count insertions <= 20 bp outside the repeat, instead of 60).
Maybe tandem-genotypes should be changed like this: when an insertion and deletion are so close to each other, merge them into one "in-del".
Hi, thank you for the response, may you provide the command to make the plot you showed? Thank you very much
Amazingly, it's still in my shell's history:
grep -B3 6f8e3f3a SAMPLE_MAF.txt | last-dotplot -a SHANK2_locus_rpmsk.txt -1 chr11:70487085-70487223 - myfig.png
Thank you very much! Sorry for the delay in my message
Hello, here's my issue:
I ran tandem-genotypes on long reads (Oxford Nanopore) on a RepeatMasker locus and obtained this result:
chr11 70487135 70487173 TGC SHANK2 coding 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,2,2,2,3 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,2,3
Therefore there should be 13 reads with additional copies of the sequence 'TGC' compared to the reference genome. However, if I extract all reads mapping to the locus 'chr11:70487135-70487173' from the MAF file and convert it to BAM (with LAST), I cannot see any insertion with IGV, in any read mapped to that locus.
How can I visualize the STR expansions? Is there a way to know which specific reads support the expansions?
Thanks in advance,
Giovanni