Closed yuxinnnnnn closed 2 years ago
Greetings! The violin plots are indeed mostly for visualization, the output in tlens_by_chr.tsv should be used for subsequent interpretation.
In practice we found that the 90th-percentile TL at each arm correlated best with external validation (e.g. single telomere arm results from STELA or avg. TL from short reads using TelomereHunter), but you can choose different methods of summarizing the TL at each arm using the -t
input option in merge_jobs.py. I generally avoid using the max due to cases like chr19q in your results here, where a single outlier read has much longer TL than the other reads at that arm. While there's always a possibility that that read might've come from a cell that actually had telomeres of that length, I've found that they're more likely attributable to sequencing artifacts most of the time (particularly if the TL is nearly as large as the entire read lengths themselves).
As for arms that are not reported, this is very tricky! Unfortunately for all of us, subtelomeres can vary quite a bit across individuals and populations, and the subtelomeres from the T2T reference that Telogator uses to anchor reads might not be representative of the subtelomeres of the sample being processed. As such, the telomeres of the missing arms can usually be found at arms other than from where they originated. E.g. you can see chr1p has quite a large number of reads, whereas arms like 4q and 6p have none. There's a possibility that the reads from 4q and 6p are mapping to 1p instead (or another arm that has an unexpectedly large number of reads). This is difficult to untangle from these aggregated statistics, but we've been digging into addressing this limitation in ongoing work. The main thing to keep in mind at this point is that chromosome arms with a large number of reads might contain telomeres from multiple arms.
Thank you so much! This is really well explained!
Hello,
I am currently getting this violin plot for the estimation of telomere length of one patient (coverage ~44 from HiFi PacBio read).
tlens_by_chr.tsv
? What would be the best way to interpret the violin plot? Would the mean value give us some useful information?aln.bam
andt2t-and-subtel.fa
and see their alignment on the genome browser?Sorry about having many questions.
Thank you again!
tlens_by_chr.tsv.zip