marbl / merqury

k-mer based assembly evaluation
Other
272 stars 19 forks source link

First peak identified as read-only #72

Closed gubrins closed 2 years ago

gubrins commented 2 years ago

Heys, First of all, thanks for developing this nice tool, I am starting in the world of assembling genomes but the information provided was super useful and it was relatively easy to run. I am assembling a de novo genome with hifiasm using HiFi data, aiming for 20x. After doing the assembly and checking it with merqury, I realized that the first peak is not identified by merqury (the one around 10x), where the second peak is identified (at 20x, as expected) but just as 1 k-mer copy. Do you know why merqury isidentifying the first peak as read-only? I attach an image about it:

output latastei hap1 spectra-cn fl

Regards, Gabriel

arangrhie commented 2 years ago

Hello Gabriel,

The first peak in the read-only portion are those that are present in the genome but not in your assembly. I could imagine your assembler ignored one haplotype and only assembled the other, resulting in a (possibly pseudo-)haploid assembly. Recent HiFi assemblers either put out a diploid assembly or an assembly with both haplotypes; So I'd suggest to double check how the assembly was run.

Best, Arang

gubrins commented 2 years ago

Thank you very much for your quick response. You are completely right!! I am using Hi-C data and here I was just checking the plot for just one haplotype, which makes sense. I am checking now the merqury plot that contains both haplotypes and it is the result I was expecting. Thank you very much!