santiago-es / Telometer

A simple regular expression based method for measuring telomere length from long-read sequencing
MIT License
1 stars 1 forks source link

Telometer with PacBio WGS Revio data #5

Closed indapa closed 2 months ago

indapa commented 2 months ago

Hi @santiago-es - thank you making this software available. I enjoyed the paper and would like to use your Telometer software. I don't have ONT reads, but based on the README, I would just align with minimap2 my pacbio reads to the t2t reference with the sub-telomere sequences included?

minimap2 -ax map-ont \
 -t [Max # of Threads for Your Machine] \ 
/path/to/reference/t2t-and-subtel.fa \
/path/to/fastq_dir/*.fastq  \
-o output.sam
santiago-es commented 2 months ago

Yes, that's correct, and you would convert to bam and sort + index just like for ONT WGS. However, you should be aware that PacBio revio WGS data has significantly shorter read length ceiling than ONT and at least in my hands the telomere measurements from the same sample in PacBio will yield shorter telomere measurements than those made by ONT. I would do a TRF southern blot (significantly cheaper) or telomere capture ONT experiment side by side to see if the discrepancy in mean telomere length is larger than expected (1000-2500 bp)

indapa commented 2 months ago

Thank you for your response. Those are good suggestions. I don't have my own ONT data, but I'll try and find HG002 WGS for ONT to compare to HG002 WGS. (I pulled HG002 WGS from PacBio website)

Regarding supp. Fig4:

Screen Shot 2024-08-08 at 9 09 34 AM

Just to confirm: Depending on the read length, the telomere length varies by PacBio, but is not the case for ONT. But it seems that the violin plots for certain read lengths are similar between ONT and PB. Am I misinterpreting something? It's hard to see the lines for mean and quartiles in the plots.

santiago-es commented 2 months ago

The takeaways from that figure are:

  1. PacBio is more bottlenecked by read length than ONT. Importantly, this is comparing PacBio telomere capture vs ONT, so this is specifically enriching for telomeres with pac bio after digesting genomic DNA.
  2. The read length and telomere length in pacbio are tightly correlated because of this, which may be fine in the case of the HEK293T you see in this figure because HEK293T cells have relatively short telomeres (mean 4-5k usually, with some variability between lab strains) but there are many cases, particularly in healthy human cells where there can be significantly longer telomeres and my fear is that PB will miss many of them, leading to underestimation of the telomere length distribution in those cases.

For PB WGS, you won't be enriching for telomeres or digesting away genomic DNA, so if you have a 15 kb HiFi read, and most of that read is subtelomere, you may be missing a lot of telomere.

Does that make sense?

indapa commented 2 months ago

Got it, since the PB reads are shorter and WGS is not enriched for telomere regions, whatever reads that do align may not span full telomeric region. Thank you for your timely responses, I really appreciate it.