yeeus / GCI

A program for assessing the T2T genome continuity
MIT License
59 stars 1 forks source link

Impact of Combining Haplotype-Resolved Genomes on GCI Evaluation Scores #12

Open gwtfight opened 1 month ago

gwtfight commented 1 month ago

I assembled two haplotype-resolved T2T genomes. When I followed your suggestion to set the parameters and combined the two haplotype genomes into a single FA file for evaluation using GCI, the score was significantly lower. However, when I evaluated the two haplotype genomes separately, the scores were much better. I only used HiFi data for the assembly. Do you know what might be causing this?

yeeus commented 1 month ago

Yeah that's a good question. Actually I had tested it on my haplotype resolved assemblies, and found the same case. The reason behind this is mapping reads to diploid genome would result in lower mapping quality (not just the mapq value) in homologous regions and some other problematic regions. And please remember GCI always keep the highest quality reads mappings, so in above diploid genome case, GCI would discard lots of alignments and get much lower scores. Therefore, as in the document, I always recommend mapping reads on each haplotype and using GCI to evaluate.

gwtfight commented 1 month ago

Thank you very much for your response and advice. I wish you a pleasant life.