marcelauliano / MitoHiFi

Find, circularise and annotate mitogenome from PacBio assemblies
MIT License
169 stars 29 forks source link

Quality assessment of mt genomes assembled using mitoHiFi #96

Closed aureliendejode closed 3 months ago

aureliendejode commented 5 months ago

Hello and thanks for putting together mitoHifi!

I just assembled 3 mt genomes from sea anemones and I would like to discuss the quality assessment of those. I have watch your youtube video on mitoHifi and also checked out the paper, but I would like to double check whether there are other metric to look at to asses the quality of mt genomes. I used mitohifi in read mode and the mt genome reference I have are from the same speices are closely related species but they were all based on short read sequencing. Here are the contigs_stats.tsv files for each species.

Species 1:

# Related mitogenome is 19999 bp long and has 13 genes
contig_id   frameshifts_found   annotation_file length(bp)  number_of_genes was_circular
final_mitogenome    No frameshift found final_mitogenome.gb 19768   28  True
atg000002l  No frameshift found final_mitogenome.gb 19768   28  True
atg000001l  No frameshift found potential_contigs/atg000001l/atg000001l.mitogenome.rotated.gb   19768   28  True

Species 2:

# Related mitogenome is 18999 bp long and has 19 genes
contig_id   frameshifts_found   annotation_file length(bp)  number_of_genes was_circular
final_mitogenome    HEG1    final_mitogenome.gb 18921   30  True
atg000001l  HEG1    final_mitogenome.gb 18921   30  True
ptg000001c  HEG1    potential_contigs/ptg000001c/ptg000001c.mitogenome.rotated.gb   18921   30  False

Species 3:

# Related mitogenome is 20960 bp long and has 13 genes
contig_id   frameshifts_found   annotation_file length(bp)  number_of_genes was_circular
final_mitogenome    No frameshift found final_mitogenome.gb 20761   26  True
ptg000001l  No frameshift found final_mitogenome.gb 20761   26  True

Based on those file I would say that my mitogenomes look good as they have similar size compred to their references and they have all been circularized. One thing that is different is that I systematically get more genes in the Hifi mt genomes. My first take on this is that this might be due to the way those references were assembled using 454 and geneious. Does that make sense ? Is it something that is commonly found ? The other thing is for species 2. There is a frameshift found for this species. To me that is not necessarily a big issue as those are different individual from different population and again the ref was assembled using older sequencing tech and software. What is your take on that ?

Best

Aurélien

marcelauliano commented 3 months ago

Hi Aurélien, Map all the mito reads back to the mitos assembled. Do the mappings look good? Is there a discordance in the mapping of the frameshift area or is it looking concordant? HiFi reads have a higher quality than 454 reads. But individual details always need a further personalised investigation. Best, M.