marcelauliano / MitoHiFi

Find, circularise and annotate mitogenome from PacBio assemblies
MIT License
169 stars 29 forks source link

Potentially false large tandem duplication #79

Open jacopoM28 opened 8 months ago

jacopoM28 commented 8 months ago

Dear MitoHiFi developers,

I am new to mitochondrial genome assemblies and currently working on a complex fungal mitogenome. I have encountered a peculiar situation and would appreciate your opinion on it, if possible.

MitoHiFi was executed with the following code:

mitohifi.py -r Tlat_hifi.fastq.gz -f NC_053885.1.fasta -g NC_053885.1.gb -t 10 -o 4 -a fungi

The reference genome comes from a congeneric species and is circularized and approximately 280 Kb long due to a significant number of introns. However, the assembled mitogenome failed to circularize and resulted in an assembly of approximately 400 Kb, featuring a significant tandem duplication of around 100 Kb. The duplicated segments are 99.64% identical. See Panel A for a self-alignment and Panel B for a comparison with the reference genome. Based on the alignment between the two genomes I also suspect that the genome is complete even if circularization failed.

MitoGenome_Comparison.pdf

No uniquely mapped reads are present within the tandem duplication, and there is also a noticeable drop in coverage at the boundaries between the two copies, with only a few reads mapping.

MitoGenome_TandemDuplication

When removing the tandem duplication, the coverage looks fine to me

MitoGenome_NoTandemDuplication

I suspect that the tandem duplication is an assembly artifact. Am I interpreting these results correctly? Is it possible that only the few reads mapping between the copies are causing hifiasm to assemble them separately?

Thanks, Jacopo