linzhi2013 / MitoZ

MitoZ: A toolkit for assembly, annotation, and visualization of animal mitochondrial genomes
https://doi.org/10.1093/nar/gkz173
GNU General Public License v3.0
115 stars 39 forks source link

assembly is linear and atp 6 is missing #142

Closed jhcaddisfly closed 2 years ago

jhcaddisfly commented 2 years ago

Hi,

I have a question. I used the following command to assemble the mitogenome:

udocker run --rm --volume=$PWD --workdir=$PWD guanliangmeng/mitoz:2.3 python3 /app/release_MitoZ_v2.3/MitoZ.py all2 --genetic_code 5 --clade Arthropoda --outprefix Ple --thread_number 60 --fastq1 ./PO2_NKD180900774_HFFTMDMXX_L1_1_sub.fq --fastq2 ./PO2_NKD180900774_HFFTMDMXX_L1_2_sub.fq --fastq_read_length 150 --run_mode 2 --filter_taxa_method 1 --requiring_taxa 'Arthropoda'

The data is 1.8 G forward and 1.8G reverse Illumina reads (150 bp)

cat summary.txt gives:

Seq_id Length(bp) Circularity Closely_related_species

scaffold169 15556 no Aleurodicus dispersus

Potential missing genes:

Gene total_missing_number


ATP8 1 tRNA-Glu 1 tRNA-Ser 2

cat errorsummary.val gives: 1 INFO: SEQ_DESCR.OrganismIsUndefinedSpecies NOTE: valid [SEQ_DESCR.OrganismIsUndefinedSpecies] Organism 'Test sp.' is undefined species and does not have a specific identifier. DESCRIPTOR: BioSrc: Test sp. BIOSEQ-SET: lcl|scaffold169

cat Plectrocnemia_conspersa_mitoscaf.fa.val gives: NOTE: valid [SEQ_DESCR.OrganismIsUndefinedSpecies] Organism 'Test sp.' is undefined species and does not have a specific identifier. DESCRIPTOR: BioSrc: Test sp. BIOSEQ-SET: lcl|scaffold169

My questions: 1.Aleurodicus dispersus (Hemiptera) is a closely related species but my species is of a different insect order. Why? 2.Why is my assembly not circular? What can I do to get it circular?

  1. How can I get the missing genes?
  2. Can you please explain what the purple and blue bars mean in the circos.png plot?

I ran a second round of udocker: udocker run --rm --volume=$PWD --workdir=$PWD guanliangmeng/mitoz:2.3 python3 /app/release_MitoZ_v2.3/MitoZ.py all2 --genetic_code 5 --clade Arthropoda --outprefix 2ndround --thread_number 60 --fastq1 clean.1.fq.gz --fastq2 clean.2.fq.gz --fastq_read_length 150 --insert_size 250 --run_mode 3 --filter_taxa_method 1 --requiring_taxa 'Arthropoda' --quick_mode_seq_file work71.mitogenome.fasta --quick_mode_fa_genes_file quick_mode_fa_genes.txt --missing_PCGs ATP8 --quick_mode_score_file work71.hmmtblout.besthit.sim.filtered.high_abundance_10.0X.reformat.sorted --quick_mode_prior_seq_file work71.hmmtblout.besthit.sim.filtered.fa

This run did not finish but I ended up with a file called work91.mitogenome.fa which was just a little bit longer and circular.

Thank you, J.

linzhi2013 commented 2 years ago

Hi J.,

Sorry for my late reply.

1.Aleurodicus dispersus (Hemiptera) is a closely related species but my species is of a different insect order. Why?

It depends on what kinds of species are in MitoZ's mitochondrial genome database. Could be due to your order is lack in MitoZ's database.

2.Why is my assembly not circular? What can I do to get it circular?

Simply because the assembled mitochondrial genome is not long enough, and MitoZ cannot find the overlapping regions at the 5' end and 3' end of the assembled mitochondrial genome.

You can try to use a different kmer (default is 71) and could possibly get a circular genome, since yours is already quite long.

I didn't know why "This run did not finish but I ended up with a file called work91.mitogenome.fa which was just a little bit longer and circular." But you can annotate this work91.mitogenome.fa file directly with MitoZ, and check what genes are missing.

  1. How can I get the missing genes?

Use the work91.mitogenome.fa file .

  1. Can you please explain what the purple and blue bars mean in the circos.png plot?

Please refer to https://github.com/linzhi2013/MitoZ/issues/11 , https://github.com/linzhi2013/MitoZ/issues/33 or https://github.com/linzhi2013/MitoZ/issues/123.

If the ATP8 gene is still missing, it would be due to this gene being too divergent and too different from any ATP8 genes in MitoZ's annotation database. For this case, you can manually add some closely related ATP8 genes to MitoZ's annotation database (the profiles/MT_database/Arthropoda_CDS_protein.fa file for your case) (check https://github.com/linzhi2013/MitoZ/issues/79 for how to do it)

jhcaddisfly commented 2 years ago

Thank you for your quick response!!!

I found the error message of the second round of mitoZ (see command above). The error message was "no missing genes from multiKmer found" it occured at this step:

/app/anaconda/bin/python3 /app/release_MitoZ_v2.3/bin/assemble/pick_missing_PCGs_from_multiKmer.py --quick_mode_seq_file /home/udockertest/2ndround/work71.mitogenome.fa --quick_mode_fa_genes_file /home/jheckenhauer/udockertest/2ndround/quick_mode_fa_genes.txt --multiKmer_sorted_scores_files /home/udockertest/2ndround/work71.hmmtblout.besthit.sim.filtered.high_abundance_10.0X.reformat.sorted /home/udockertest/2ndround/tmp/ple.assembly2/work31.hmmtblout.besthit.sim.filtered.high_abundance_10.0X.reformat.sorted /home/udockertest/2ndround/tmp/ple.assembly2/work91.hmmtblout.besthit.sim.filtered.high_abundance_10.0X.reformat.sorted --multiKmer_seqs_files /home/jheckenhauer/udockertest/2ndround/work71.hmmtblout.besthit.sim.filtered.fa /home/udockertest/2ndround/tmp/ple.assembly2/work31.hmmtblout.besthit.sim.filtered.fa /home/udockertest/2ndround/tmp/ple.assembly2/work91.hmmtblout.besthit.sim.filtered.fa --missing_genes ATP8 --outprefix ple --blastn blastn

When I run annotate and the visualize on the work91.mitogenome.fa the mitogenome is circular but it is still missing the ATP8 gene. I am using the udocker version of mitoZ because I had problems with the database. Is there any way to add a user-specific ATP8 sequence when I use udocker? Where do I find the Arthropoda_CDS_protein.fa? I used this command for annotation: udocker run --rm --volume=$PWD --workdir=$PWD guanliangmeng/mitoz:2.3 python3 /app/release_MitoZ_v2.3/MitoZ.py annotate --genetic_code 5 --clade Arthropoda --outprefix pl --thread_number 8 --fastafile work91.mitogenome.fa

For the middle circle, that reveals the depth distribution. It looks weird (attached). Do you know why? I used this command to visualize the mitogenome: udocker run --rm --volume=$PWD --workdir=$PWD guanliangmeng/mitoz:2.3 python3 /app/release_MitoZ_v2.3/MitoZ.py visualize --circos circos --gb pl_mitoscaf.fa.gbf --gc yes --win 50 --gc_fill 128,177,211 --run_map yes --bwa bwa --thread 20 --depth_fill 190,186,218 --fq1 clean.1.fq.gz

Thank you for your answers and this great program!

circos

linzhi2013 commented 2 years ago

Hi J.,

If "the work91.mitogenome.fa the mitogenome is circular" already, there is no need to try the multiple kmer mode, which is a bit complicated. In this case, the ATP8 is already in the resulting sequence, MitoZ just failed to annotate this gene due to it being too divergent (different) from the genes in MitoZ's annotation database.

The/app/release_MitoZ_v2.3/bin/profiles/MT_database/Arthropoda_CDS_protein.fa. (check https://github.com/linzhi2013/MitoZ/issues/79 for how to do it)

For udocker or docker, you can shell into the container and change this file, and run your analysis. However, keep in mind that, in this way, when the container is shut down, any changes to the container are lost.

The sequencing depth at the beginning of your mitogenome is abnormal, please have a look at your sequence content of this region, if it is normal, then the whole thing is fine.

Cheers

jhcaddisfly commented 2 years ago

Thank you for your quick answer!!!! How can I look at the sequence content of this region? My data looks like this for all my mitogenomes.

All the best, Jacqueline

linzhi2013 commented 2 years ago

the work91.mitogenome.fa is just a plain text file, you can use the less command to check the sequence. or open it with a program like MEGA X, or Ugene, or any text editor.