johnomics / tapestry

Validate and edit small eukaryotic genome assemblies
MIT License
31 stars 2 forks source link

haplotype calling #5

Closed phiweger closed 4 years ago

phiweger commented 4 years ago

Very nice visualization tool!

Just out of curiosity/ ignorance -- how do you determine the haplotype of the sample?

Thanks!

johnomics commented 4 years ago

Thank you! What do you mean by "haplotype of the sample"? Tapestry estimates ploidy for each window in each contig, and contigs with mostly windows of ploidy 1 may well be haplotypes. Ploidy is estimated by calculating the median read depth across the whole genome, assuming that represents diploid coverage. Tapestry then calculates values for haploid, triploid and tetraploid coverage - haploid coverage is half diploid coverage for example. Then each window is assigned a ploidy based on which ploidy depth its read depth is closest to. Is that what you're looking for? Full methods are in the supplement of the preprint.

phiweger commented 4 years ago

thank you for the explanation! so to assign a ploidy to a contig I would average over the "ploidies" of the windows right? I'm new to fungi, so is this like a standard method?

johnomics commented 4 years ago

Yes, it's a judgement call based on the window ploidys. Tapestry is intentionally designed as a manual annotation tool, so does not make a call about the ploidy of the whole contig, because ploidy can vary across a contig for interesting reasons (repeat content, translocations, structural variations etc). The contig_details.tsv file reports the percentage of each contig assigned to each ploidy, which might help in classifying the contigs. It's common to use read depth to assess copy number - for example, see the mosdepth documentation for an example of identifying sex by read depth on the sex chromosomes.

phiweger commented 4 years ago

ah that makes sense, thanks a lot for this explanation!