marcelauliano / MitoHiFi

Find, circularise and annotate mitogenome from PacBio assemblies
MIT License
169 stars 28 forks source link

'parsed_blast.txt' empty due to hifiasm empty result files #24

Open athenasyarifa opened 2 years ago

athenasyarifa commented 2 years ago

Hi Marcela and fellow contributors, Thank you for developing such useful tool! I have been trying to assemble mitogenome of a bird species using raw reads with this command:

 -r $reads/m64077_211118_103836.hifi_reads.fasta.gz $reads/m64204e_211125_011116.hifi_reads.fasta.gz \
 -f $ref/KP137624.1.fasta \
 -g $ref/KP137624.1.gb \
 -t 16 -a animal

But then, I get the following error:

2022-08-11 16:51:06 [INFO] Welcome to MitoHifi v2. Starting pipeline...
2022-08-11 16:51:06 [INFO] Length of related mitogenome is: 16776 bp
2022-08-11 16:51:06 [INFO] Number of genes on related mitogenome: 37
2022-08-11 16:51:06 [INFO] Running MitoHifi pipeline in reads mode...
2022-08-11 16:51:06 [INFO] 1. First we map your Pacbio HiFi reads to the close-related mitogenome
2022-08-11 16:51:06 [INFO] minimap2 -t 16 --secondary=no -ax map-pb /dss/dsslegfs01/pr53da/pr53da-dss-0026/projects/2022__de_novo_genome/assembly/mitochondria/KP137624.1.fasta /dss/dsslegfs01/pr53da/pr53da-dss-0026/projects/2022__de_novo_genome/data/DNA_PacBio/m64204e_211125_011116.hifi_reads.fasta.gz | samtools view -@ 16 -S -b -F4 -F 0x800 > reads.HiFiMapped.bam
2022-08-11 16:54:53 [INFO] 2. Now we filter out any mapped reads that are larger than the reference mitogenome to avoid NUMTS
2022-08-11 16:54:53 [INFO] 2.1 First we convert the mapped reads from BAM to FASTA format:
2022-08-11 16:54:53 [INFO] samtools fasta reads.HiFiMapped.bam > gbk.HiFiMapped.bam.fasta
2022-08-11 16:54:54 [INFO] Total number of mapped reads: 90
2022-08-11 16:54:54 [INFO] 2.2 Then we filter reads that are larger than 16776 bp
2022-08-11 16:54:54 [INFO] Number of filtered reads: 5
2022-08-11 16:54:54 [INFO] 3. Now let's run hifiasm to assemble the mapped and filtered reads!
2022-08-11 16:54:54 [INFO] hifiasm --primary -t 16 -f 0 -o gbk.HiFiMapped.bam.filtered.assembled gbk.HiFiMapped.bam.filtered.fasta
2022-08-11 16:54:54 [INFO] 4. Let's run the blast of the contigs versus the close-related mitogenome
2022-08-11 16:54:54 [INFO] 4.1. Creating BLAST database:
2022-08-11 16:54:54 [INFO] makeblastdb -in /dss/dsslegfs01/pr53da/pr53da-dss-0026/projects/2022__de_novo_genome/assembly/mitochondria/KP137624.1.fasta -dbtype nucl
2022-08-11 16:54:54 [INFO] Makeblastdb done.
2022-08-11 16:54:54 [INFO] 4.2. Running blast of contigs against close-related mitogenome:
2022-08-11 16:54:54 [INFO] blastn -query hifiasm.contigs.fasta -db /dss/dsslegfs01/pr53da/pr53da-dss-0026/projects/2022__de_novo_genome/assembly/mitochondria/KP137624.1.fasta -num_threads 16 -out contigs.blastn -outfmt 6 std qlen slen
Warning: [blastn] Query is Empty!
2022-08-11 16:54:54 [INFO] Blast done.
2022-08-11 16:54:54 [INFO] 5. Filtering BLAST output to select target sequences
2022-08-11 16:54:54 [INFO] Filtering thresholds applied:
2022-08-11 16:54:54 [INFO] Minimum query percentage = 50
2022-08-11 16:54:54 [INFO] Minimum query length = 80% subject length
2022-08-11 16:54:54 [INFO] Maximum query length = 5 times subject length
Attention!
'parsed_blast.txt' and 'parsed_blast_all.txt' files are empty.
The pipeline has stopped !! You need to run further scripts to check if you have mito reads pulled to a large NUMT!

I thought it was because of the same problem as issue #10 , but the command for hifiasm is correct. I have the filtered reads (although only 5) but then the contigs.blastn file and all of the hifiasm output files are empty (including hifiasm.contigs.fasta). I am not sure why this happened. Can someone help me?

This is how my directory looks:

-rw-rw---- 1 ra35naf pr53da-dss-0026    0 Aug 11 16:54 contigs.blastn
-rw-rw---- 1 ra35naf pr53da-dss-0026 1.8M Aug 11 16:54 gbk.HiFiMapped.bam.fasta
-rw-rw---- 1 ra35naf pr53da-dss-0026    0 Aug 11 16:54 gbk.HiFiMapped.bam.filtered.assembled.a_ctg.fa
-rw-rw---- 1 ra35naf pr53da-dss-0026    0 Aug 11 16:54 gbk.HiFiMapped.bam.filtered.assembled.a_ctg.gfa
-rw-rw---- 1 ra35naf pr53da-dss-0026    0 Aug 11 16:54 gbk.HiFiMapped.bam.filtered.assembled.a_ctg.lowQ.bed
-rw-rw---- 1 ra35naf pr53da-dss-0026    0 Aug 11 16:54 gbk.HiFiMapped.bam.filtered.assembled.a_ctg.noseq.gfa
-rw-rw---- 1 ra35naf pr53da-dss-0026  28K Jul 24 13:47 gbk.HiFiMapped.bam.filtered.assembled.ec.bin
-rw-rw---- 1 ra35naf pr53da-dss-0026   38 Jul 24 13:47 gbk.HiFiMapped.bam.filtered.assembled.ovlp.reverse.bin
-rw-rw---- 1 ra35naf pr53da-dss-0026  122 Jul 24 13:47 gbk.HiFiMapped.bam.filtered.assembled.ovlp.source.bin
-rw-rw---- 1 ra35naf pr53da-dss-0026    0 Aug 11 16:54 gbk.HiFiMapped.bam.filtered.assembled.p_ctg.fa
-rw-rw---- 1 ra35naf pr53da-dss-0026    0 Aug 11 16:54 gbk.HiFiMapped.bam.filtered.assembled.p_ctg.gfa
-rw-rw---- 1 ra35naf pr53da-dss-0026    0 Aug 11 16:54 gbk.HiFiMapped.bam.filtered.assembled.p_ctg.lowQ.bed
-rw-rw---- 1 ra35naf pr53da-dss-0026    0 Aug 11 16:54 gbk.HiFiMapped.bam.filtered.assembled.p_ctg.noseq.gfa
-rw-rw---- 1 ra35naf pr53da-dss-0026    0 Aug 11 16:54 gbk.HiFiMapped.bam.filtered.assembled.p_utg.gfa
-rw-rw---- 1 ra35naf pr53da-dss-0026    0 Aug 11 16:54 gbk.HiFiMapped.bam.filtered.assembled.p_utg.lowQ.bed
-rw-rw---- 1 ra35naf pr53da-dss-0026    0 Aug 11 16:54 gbk.HiFiMapped.bam.filtered.assembled.p_utg.noseq.gfa
-rw-rw---- 1 ra35naf pr53da-dss-0026    0 Aug 11 16:54 gbk.HiFiMapped.bam.filtered.assembled.r_utg.gfa
-rw-rw---- 1 ra35naf pr53da-dss-0026    0 Aug 11 16:54 gbk.HiFiMapped.bam.filtered.assembled.r_utg.lowQ.bed
-rw-rw---- 1 ra35naf pr53da-dss-0026    0 Aug 11 16:54 gbk.HiFiMapped.bam.filtered.assembled.r_utg.noseq.gfa
-rw-rw---- 1 ra35naf pr53da-dss-0026  80K Aug 11 16:54 gbk.HiFiMapped.bam.filtered.fasta
-rw-rw---- 1 ra35naf pr53da-dss-0026    0 Aug 11 16:54 hifiasm.contigs.fasta
-rw-rw---- 1 ra35naf pr53da-dss-0026  989 Aug 11 16:54 hifiasm.log
-rw-rw---- 1 ra35naf pr53da-dss-0026  906 Jul 24 13:43 mito.sh
-rw-rw---- 1 ra35naf pr53da-dss-0026   44 Aug 11 16:54 parsed_blast_all.txt
-rw-rw---- 1 ra35naf pr53da-dss-0026   44 Aug 11 16:54 parsed_blast.txt
-rw-rw---- 1 ra35naf pr53da-dss-0026 541K Aug 11 16:54 reads.HiFiMapped.bam

Many thanks, and have a good day! Best, Rifa

marcelauliano commented 2 years ago

Hi Rifa, Seems like you don't have enough reads for hifiasm. 1-) Can you generate the statistics of your filtered reads? 2-) Can you do the same for the unfiltered? samtools fasta gbk.HiFiMapped.bam.fasta

Depending on these statistics, it's possible you could use the filtered reads as -c

Em qui., 11 de ago. de 2022 às 16:57, Athena Syarifa < @.***> escreveu:

Hi Marcela and fellow contributors, Thank you for developing such useful tool! I have been trying to assemble mitogenome of a bird species using raw reads with this command:

-r $reads/m64077_211118_103836.hifi_reads.fasta.gz $reads/m64204e_211125_011116.hifi_reads.fasta.gz \ -f $ref/KP137624.1.fasta \ -g $ref/KP137624.1.gb \ -t 16 -a animal

But then, I get the following error:

2022-08-11 16:51:06 [INFO] Welcome to MitoHifi v2. Starting pipeline... 2022-08-11 16:51:06 [INFO] Length of related mitogenome is: 16776 bp 2022-08-11 16:51:06 [INFO] Number of genes on related mitogenome: 37 2022-08-11 16:51:06 [INFO] Running MitoHifi pipeline in reads mode... 2022-08-11 16:51:06 [INFO] 1. First we map your Pacbio HiFi reads to the close-related mitogenome 2022-08-11 16:51:06 [INFO] minimap2 -t 16 --secondary=no -ax map-pb /dss/dsslegfs01/pr53da/pr53da-dss-0026/projects/2022de_novo_genome/assembly/mitochondria/KP137624.1.fasta /dss/dsslegfs01/pr53da/pr53da-dss-0026/projects/2022de_novo_genome/data/DNA_PacBio/m64204e_211125_011116.hifi_reads.fasta.gz | samtools view -@ 16 -S -b -F4 -F 0x800 > reads.HiFiMapped.bam 2022-08-11 16:54:53 [INFO] 2. Now we filter out any mapped reads that are larger than the reference mitogenome to avoid NUMTS 2022-08-11 16:54:53 [INFO] 2.1 First we convert the mapped reads from BAM to FASTA format: 2022-08-11 16:54:53 [INFO] samtools fasta reads.HiFiMapped.bam > gbk.HiFiMapped.bam.fasta 2022-08-11 16:54:54 [INFO] Total number of mapped reads: 90 2022-08-11 16:54:54 [INFO] 2.2 Then we filter reads that are larger than 16776 bp 2022-08-11 16:54:54 [INFO] Number of filtered reads: 5 2022-08-11 16:54:54 [INFO] 3. Now let's run hifiasm to assemble the mapped and filtered reads! 2022-08-11 16:54:54 [INFO] hifiasm --primary -t 16 -f 0 -o gbk.HiFiMapped.bam.filtered.assembled gbk.HiFiMapped.bam.filtered.fasta 2022-08-11 16:54:54 [INFO] 4. Let's run the blast of the contigs versus the close-related mitogenome 2022-08-11 16:54:54 [INFO] 4.1. Creating BLAST database: 2022-08-11 16:54:54 [INFO] makeblastdb -in /dss/dsslegfs01/pr53da/pr53da-dss-0026/projects/2022de_novo_genome/assembly/mitochondria/KP137624.1.fasta -dbtype nucl 2022-08-11 16:54:54 [INFO] Makeblastdb done. 2022-08-11 16:54:54 [INFO] 4.2. Running blast of contigs against close-related mitogenome: 2022-08-11 16:54:54 [INFO] blastn -query hifiasm.contigs.fasta -db /dss/dsslegfs01/pr53da/pr53da-dss-0026/projects/2022de_novo_genome/assembly/mitochondria/KP137624.1.fasta -num_threads 16 -out contigs.blastn -outfmt 6 std qlen slen Warning: [blastn] Query is Empty! 2022-08-11 16:54:54 [INFO] Blast done. 2022-08-11 16:54:54 [INFO] 5. Filtering BLAST output to select target sequences 2022-08-11 16:54:54 [INFO] Filtering thresholds applied: 2022-08-11 16:54:54 [INFO] Minimum query percentage = 50 2022-08-11 16:54:54 [INFO] Minimum query length = 80% subject length 2022-08-11 16:54:54 [INFO] Maximum query length = 5 times subject length Attention! 'parsed_blast.txt' and 'parsed_blast_all.txt' files are empty. The pipeline has stopped !! You need to run further scripts to check if you have mito reads pulled to a large NUMT!

I thought it was because of the same problem as issue #10 https://github.com/marcelauliano/MitoHiFi/issues/10 , but the command for hifiasm is correct. I have the filtered reads (although only 5) but then the contigs.blastn file and all of the hifiasm output files are empty (including hifiasm.contigs.fasta). I am not sure why this happened. Can someone help me?

This is how my directory looks:

-rw-rw---- 1 ra35naf pr53da-dss-0026 0 Aug 11 16:54 contigs.blastn -rw-rw---- 1 ra35naf pr53da-dss-0026 1.8M Aug 11 16:54 gbk.HiFiMapped.bam.fasta -rw-rw---- 1 ra35naf pr53da-dss-0026 0 Aug 11 16:54 gbk.HiFiMapped.bam.filtered.assembled.a_ctg.fa -rw-rw---- 1 ra35naf pr53da-dss-0026 0 Aug 11 16:54 gbk.HiFiMapped.bam.filtered.assembled.a_ctg.gfa -rw-rw---- 1 ra35naf pr53da-dss-0026 0 Aug 11 16:54 gbk.HiFiMapped.bam.filtered.assembled.a_ctg.lowQ.bed -rw-rw---- 1 ra35naf pr53da-dss-0026 0 Aug 11 16:54 gbk.HiFiMapped.bam.filtered.assembled.a_ctg.noseq.gfa -rw-rw---- 1 ra35naf pr53da-dss-0026 28K Jul 24 13:47 gbk.HiFiMapped.bam.filtered.assembled.ec.bin -rw-rw---- 1 ra35naf pr53da-dss-0026 38 Jul 24 13:47 gbk.HiFiMapped.bam.filtered.assembled.ovlp.reverse.bin -rw-rw---- 1 ra35naf pr53da-dss-0026 122 Jul 24 13:47 gbk.HiFiMapped.bam.filtered.assembled.ovlp.source.bin -rw-rw---- 1 ra35naf pr53da-dss-0026 0 Aug 11 16:54 gbk.HiFiMapped.bam.filtered.assembled.p_ctg.fa -rw-rw---- 1 ra35naf pr53da-dss-0026 0 Aug 11 16:54 gbk.HiFiMapped.bam.filtered.assembled.p_ctg.gfa -rw-rw---- 1 ra35naf pr53da-dss-0026 0 Aug 11 16:54 gbk.HiFiMapped.bam.filtered.assembled.p_ctg.lowQ.bed -rw-rw---- 1 ra35naf pr53da-dss-0026 0 Aug 11 16:54 gbk.HiFiMapped.bam.filtered.assembled.p_ctg.noseq.gfa -rw-rw---- 1 ra35naf pr53da-dss-0026 0 Aug 11 16:54 gbk.HiFiMapped.bam.filtered.assembled.p_utg.gfa -rw-rw---- 1 ra35naf pr53da-dss-0026 0 Aug 11 16:54 gbk.HiFiMapped.bam.filtered.assembled.p_utg.lowQ.bed -rw-rw---- 1 ra35naf pr53da-dss-0026 0 Aug 11 16:54 gbk.HiFiMapped.bam.filtered.assembled.p_utg.noseq.gfa -rw-rw---- 1 ra35naf pr53da-dss-0026 0 Aug 11 16:54 gbk.HiFiMapped.bam.filtered.assembled.r_utg.gfa -rw-rw---- 1 ra35naf pr53da-dss-0026 0 Aug 11 16:54 gbk.HiFiMapped.bam.filtered.assembled.r_utg.lowQ.bed -rw-rw---- 1 ra35naf pr53da-dss-0026 0 Aug 11 16:54 gbk.HiFiMapped.bam.filtered.assembled.r_utg.noseq.gfa -rw-rw---- 1 ra35naf pr53da-dss-0026 80K Aug 11 16:54 gbk.HiFiMapped.bam.filtered.fasta -rw-rw---- 1 ra35naf pr53da-dss-0026 0 Aug 11 16:54 hifiasm.contigs.fasta -rw-rw---- 1 ra35naf pr53da-dss-0026 989 Aug 11 16:54 hifiasm.log -rw-rw---- 1 ra35naf pr53da-dss-0026 906 Jul 24 13:43 mito.sh -rw-rw---- 1 ra35naf pr53da-dss-0026 44 Aug 11 16:54 parsed_blast_all.txt -rw-rw---- 1 ra35naf pr53da-dss-0026 44 Aug 11 16:54 parsed_blast.txt -rw-rw---- 1 ra35naf pr53da-dss-0026 541K Aug 11 16:54 reads.HiFiMapped.bam

Many thanks, and have a good day! Best, Rifa

— Reply to this email directly, view it on GitHub https://github.com/marcelauliano/MitoHiFi/issues/24, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA7M5RFEIWD2XOHY3SOKYALVYUPFVANCNFSM56IWFE2Q . You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- Marcela Uliano da Silva, PhD

Senior Bioinformatician - Wellcome Sanger Institute Darwin Tree of Life Project

Churchill College Cambridge By-Fellow

Cambridge, UK

athenasyarifa commented 2 years ago

Hi Marcela, Thanks for the prompt response. Yes, you are right. I only have 5 reads as an input to hifiasm because it seems that MitoHiFi filtered the reads that are longer than the reference mitogenome. Below is the statistics for the filtered reads:

A       C       G       T       N       IUPAC   Other   GC      GC_stdev
0.3033  0.2109  0.1986  0.2872  0.0000  0.0000  0.0000  0.4095  0.0427

Main genome scaffold total:             5
Main genome contig total:               5
Main genome scaffold sequence total:    0.080 MB
Main genome contig sequence total:      0.080 MB        0.000% gap
Main genome scaffold N/L50:             3/15.962 KB
Main genome contig N/L50:               3/15.962 KB
Main genome scaffold N/L90:             5/15.214 KB
Main genome contig N/L90:               5/15.214 KB
Max scaffold length:                    16.507 KB
Max contig length:                      16.507 KB
Number of scaffolds > 50 KB:            0
% main genome in scaffolds > 50 KB:     0.00%

Minimum         Number          Number          Total           Total           Scaffold
Scaffold        of              of              Scaffold        Contig          Contig
Length          Scaffolds       Contigs         Length          Length          Coverage
--------        --------------  --------------  --------------  --------------  --------
    All                      5               5          79,540          79,540   100.00%
  10 KB                      5               5          79,540          79,540   100.00%

And below is the statistics for the unfiltered reads:

A       C       G       T       N       IUPAC   Other   GC      GC_stdev
0.2893  0.2068  0.2098  0.2941  0.0000  0.0000  0.0000  0.4166  0.0440

Main genome scaffold total:             90
Main genome contig total:               90
Main genome scaffold sequence total:    1.883 MB
Main genome contig sequence total:      1.883 MB        0.000% gap
Main genome scaffold N/L50:             41/21.041 KB
Main genome contig N/L50:               41/21.041 KB
Main genome scaffold N/L90:             79/17.708 KB
Main genome contig N/L90:               79/17.708 KB
Max scaffold length:                    27.951 KB
Max contig length:                      27.951 KB
Number of scaffolds > 50 KB:            0
% main genome in scaffolds > 50 KB:     0.00%

Minimum         Number          Number          Total           Total           Scaffold
Scaffold        of              of              Scaffold        Contig          Contig
Length          Scaffolds       Contigs         Length          Length          Coverage
--------        --------------  --------------  --------------  --------------  --------
    All                     90              90       1,883,185       1,883,185   100.00%
  10 KB                     90              90       1,883,185       1,883,185   100.00%
  25 KB                      5               5         131,208         131,208   100.00%

Do you mean I can use the filtered reads as input for MitoHiFi contigs mode using the flag -c ? Can you confirm? Many thanks!

Cheers, Rifa

athenasyarifa commented 2 years ago

Is there a reason MitoHiFi filtered the reads that are more than 5 times the subject length? I also quickly tried the contigs mode using my draft assembly yesterday and received the same error (that the 'parsed_blast.txt' and 'parsed_blast_all.txt' files are empty). I think it is because MitoHiFi filtered all of the sequences after the blast step due to the length of my contigs/scaffolds which are much longer than the reference mitogenome. I am not sure if it is something I have done wrong or if MitoHiFi is not the appropriate tool for my case.

marcelauliano commented 2 years ago

Contigs that are more than 5x larger will be a numt. I think your problem is your sequenced sample, not the tool =)

You can try using the filtered reads and contigs, yes. (-c) Let;s see if this 15kb read is your mito

Em sex., 12 de ago. de 2022 às 11:49, Athena Syarifa < @.***> escreveu:

Is there a reason MitoHiFi filtered the reads that are more than 5 times the subject length? I also quickly tried the contigs mode using my draft assembly yesterday and received the same error (that the 'parsed_blast.txt' and 'parsed_blast_all.txt' files are empty). I think it is because MitoHiFi filtered all of the sequences after the blast step due to the length of my contigs/scaffolds which are much longer than the reference mitogenome. I am not sure if it is something I have done wrong or if MitoHiFi is not the appropriate tool for my case.

— Reply to this email directly, view it on GitHub https://github.com/marcelauliano/MitoHiFi/issues/24#issuecomment-1212979157, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA7M5RHRD6R6D32IZ7KG64DVYYTZ5ANCNFSM56IWFE2Q . You are receiving this because you commented.Message ID: @.***>

-- Marcela Uliano da Silva, PhD

Senior Bioinformatician - Wellcome Sanger Institute Darwin Tree of Life Project

Churchill College Cambridge By-Fellow

Cambridge, UK

athenasyarifa commented 2 years ago

Ah I see, of course. I tried using the filtered reads in contig mode and encountered another error (I think in the annotation step). I suspect there might be an error when running the MitoFinder?

2022-08-12 14:04:25 [INFO] Welcome to MitoHifi v2. Starting pipeline...
2022-08-12 14:04:26 [INFO] Length of related mitogenome is: 16776 bp
2022-08-12 14:04:26 [INFO] Number of genes on related mitogenome: 37
2022-08-12 14:04:26 [INFO] Running MitoHifi pipeline in contigs mode...
2022-08-12 14:04:26 [INFO] 1. Fixing potentially conflicting FASTA headers
2022-08-12 14:04:26 [INFO] 2. Let's run the blast of the contigs versus the close-related mitogenome
2022-08-12 14:04:26 [INFO] 2.1. Creating BLAST database:
2022-08-12 14:04:26 [INFO] makeblastdb -in /dss/dsslegfs01/pr53da/pr53da-dss-0026/projects/2022__de_novo_genome/assembly/mitochondria/KP137624.1.fasta -dbtype nucl
2022-08-12 14:04:26 [INFO] Makeblastdb done.
2022-08-12 14:04:26 [INFO] 2.2. Running blast of contigs against close-related mitogenome:
2022-08-12 14:04:26 [INFO] blastn -query gbk.HiFiMapped.bam.filtered.fasta -db /dss/dsslegfs01/pr53da/pr53da-dss-0026/projects/2022__de_novo_genome/assembly/mitochondria/KP137624.1.fasta -num_threads 16 -out contigs.blastn -outfmt 6 std qlen slen
2022-08-12 14:04:26 [INFO] Blast done.
2022-08-12 14:04:26 [INFO] 3. Filtering BLAST output to select target sequences
2022-08-12 14:04:26 [INFO] Filtering thresholds applied:
2022-08-12 14:04:26 [INFO] Minimum query percentage = 50
2022-08-12 14:04:26 [INFO] Minimum query length = 80% subject length
2022-08-12 14:04:26 [INFO] Maximum query length = 5 times subject length
2022-08-12 14:04:27 [INFO] Filtering BLAST finished. A list of the filtered contigs was saved on ./contigs_filtering/contigs_ids.txt file
2022-08-12 14:04:27 [INFO] 4. Now we are going to circularize, annotate and rotate each filtered contig. Those are potential mitogenome(s).
2022-08-12 14:04:27 [INFO] Working with contig m64204e_211125_011116/163972531/ccs
2022-08-12 14:04:27 [INFO] Working with contig m64204e_211125_011116/166725359/ccs
2022-08-12 14:04:27 [INFO] Working with contig m64204e_211125_011116/24707997/ccs
2022-08-12 14:04:27 [INFO] Working with contig m64204e_211125_011116/64488503/ccs
2022-08-12 14:04:27 [INFO] Working with contig m64204e_211125_011116/75366945/ccs
Traceback (most recent call last):
  File "/dss/dsshome1/lxc0C/ra35naf/miniconda3/envs/mito/bin/MitoHiFi/mitohifi.py", line 376, in <module>
    main()
  File "/dss/dsshome1/lxc0C/ra35naf/miniconda3/envs/mito/bin/MitoHiFi/mitohifi.py", line 262, in main
    tRNA_ref = fetch.get_ref_tRNA()
  File "/dss/dsshome1/lxc0C/ra35naf/miniconda3/envs/mito/bin/MitoHiFi/fetch.py", line 40, in get_ref_tRNA
    reference_tRNA = max(tRNAs, key=tRNAs.get)
ValueError: max() arg is an empty sequence

Do you have any suggestions on what I should do? Thank you ^^

marcelauliano commented 2 years ago

Ah! Try editing your reads ids not to have any | or / or \

On Fri, 12 Aug 2022 at 13:11, Athena Syarifa @.***> wrote:

Ah I see, of course. I tried using the filtered reads in contig mode and encountered another error (I think in the annotation step). I suspect there might be an error when running the MitoFinder?

2022-08-12 14:04:25 [INFO] Welcome to MitoHifi v2. Starting pipeline... 2022-08-12 14:04:26 [INFO] Length of related mitogenome is: 16776 bp 2022-08-12 14:04:26 [INFO] Number of genes on related mitogenome: 37 2022-08-12 14:04:26 [INFO] Running MitoHifi pipeline in contigs mode... 2022-08-12 14:04:26 [INFO] 1. Fixing potentially conflicting FASTA headers 2022-08-12 14:04:26 [INFO] 2. Let's run the blast of the contigs versus the close-related mitogenome 2022-08-12 14:04:26 [INFO] 2.1. Creating BLAST database: 2022-08-12 14:04:26 [INFO] makeblastdb -in /dss/dsslegfs01/pr53da/pr53da-dss-0026/projects/2022de_novo_genome/assembly/mitochondria/KP137624.1.fasta -dbtype nucl 2022-08-12 14:04:26 [INFO] Makeblastdb done. 2022-08-12 14:04:26 [INFO] 2.2. Running blast of contigs against close-related mitogenome: 2022-08-12 14:04:26 [INFO] blastn -query gbk.HiFiMapped.bam.filtered.fasta -db /dss/dsslegfs01/pr53da/pr53da-dss-0026/projects/2022de_novo_genome/assembly/mitochondria/KP137624.1.fasta -num_threads 16 -out contigs.blastn -outfmt 6 std qlen slen 2022-08-12 14:04:26 [INFO] Blast done. 2022-08-12 14:04:26 [INFO] 3. Filtering BLAST output to select target sequences 2022-08-12 14:04:26 [INFO] Filtering thresholds applied: 2022-08-12 14:04:26 [INFO] Minimum query percentage = 50 2022-08-12 14:04:26 [INFO] Minimum query length = 80% subject length 2022-08-12 14:04:26 [INFO] Maximum query length = 5 times subject length 2022-08-12 14:04:27 [INFO] Filtering BLAST finished. A list of the filtered contigs was saved on ./contigs_filtering/contigs_ids.txt file 2022-08-12 14:04:27 [INFO] 4. Now we are going to circularize, annotate and rotate each filtered contig. Those are potential mitogenome(s). 2022-08-12 14:04:27 [INFO] Working with contig m64204e_211125_011116/163972531/ccs 2022-08-12 14:04:27 [INFO] Working with contig m64204e_211125_011116/166725359/ccs 2022-08-12 14:04:27 [INFO] Working with contig m64204e_211125_011116/24707997/ccs 2022-08-12 14:04:27 [INFO] Working with contig m64204e_211125_011116/64488503/ccs 2022-08-12 14:04:27 [INFO] Working with contig m64204e_211125_011116/75366945/ccs Traceback (most recent call last): File "/dss/dsshome1/lxc0C/ra35naf/miniconda3/envs/mito/bin/MitoHiFi/mitohifi.py", line 376, in main() File "/dss/dsshome1/lxc0C/ra35naf/miniconda3/envs/mito/bin/MitoHiFi/mitohifi.py", line 262, in main tRNA_ref = fetch.get_ref_tRNA() File "/dss/dsshome1/lxc0C/ra35naf/miniconda3/envs/mito/bin/MitoHiFi/fetch.py", line 40, in get_ref_tRNA reference_tRNA = max(tRNAs, key=tRNAs.get) ValueError: max() arg is an empty sequence

Do you have any suggestions on what I should do? Thank you ^^

— Reply to this email directly, view it on GitHub https://github.com/marcelauliano/MitoHiFi/issues/24#issuecomment-1213046577, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA7M5RHHFU7DW26UQ37E3LLVYY5QFANCNFSM56IWFE2Q . You are receiving this because you commented.Message ID: @.***>

-- Marcela Uliano da Silva, PhD

Senior Bioinformatician - Wellcome Sanger Institute Darwin Tree of Life Project

Churchill College Cambridge By-Fellow

Cambridge, UK

athenasyarifa commented 2 years ago

Ah, it worked! Up until circularization everything worked fine, but the same error appeared again when annotating even though I have deleted / or \ from my seq IDs.

2022-08-12 15:41:53 [INFO] 3. Filtering BLAST output to select target sequences
2022-08-12 15:41:53 [INFO] Filtering thresholds applied:
2022-08-12 15:41:53 [INFO] Minimum query percentage = 50
2022-08-12 15:41:53 [INFO] Minimum query length = 80% subject length
2022-08-12 15:41:53 [INFO] Maximum query length = 5 times subject length
2022-08-12 15:41:53 [INFO] Filtering BLAST finished. A list of the filtered contigs was saved on ./contigs_filtering/contigs_ids.txt file
2022-08-12 15:41:53 [INFO] 4. Now we are going to circularize, annotate and rotate each filtered contig. Those are potential mitogenome(s).
2022-08-12 15:41:54 [INFO] Working with contig m64204e_211125_011116-75366945-ccs
2022-08-12 15:41:54 [INFO] Working with contig m64204e_211125_011116-24707997-ccs
2022-08-12 15:41:54 [INFO] Working with contig m64204e_211125_011116-64488503-ccs
2022-08-12 15:41:54 [INFO] Working with contig m64204e_211125_011116-163972531-ccs
2022-08-12 15:41:54 [INFO] Working with contig m64204e_211125_011116-166725359-ccs
2022-08-12 15:41:54 [INFO] Started m64204e_211125_011116-75366945-ccs circularization
2022-08-12 15:41:54 [INFO] Started m64204e_211125_011116-24707997-ccs circularization
2022-08-12 15:41:54 [INFO] Started m64204e_211125_011116-64488503-ccs circularization
2022-08-12 15:41:54 [INFO] Started m64204e_211125_011116-163972531-ccs circularization
2022-08-12 15:41:54 [INFO] Started m64204e_211125_011116-166725359-ccs circularization
2022-08-12 15:41:54 [INFO] m64204e_211125_011116-64488503-ccs circularization done. Circularization info saved on ./potential_contigs/m64204e_211125_011116-64488503-ccs/m64204e_211125_011116-64488503-ccs.circularisationCheck.txt
2022-08-12 15:41:54 [INFO] Started m64204e_211125_011116-64488503-ccs (MitoFinder) annotation
2022-08-12 15:41:54 [INFO] m64204e_211125_011116-163972531-ccs circularization done. Circularization info saved on ./potential_contigs/m64204e_211125_011116-163972531-ccs/m64204e_211125_011116-163972531-ccs.circularisationCheck.txt
2022-08-12 15:41:54 [INFO] Started m64204e_211125_011116-163972531-ccs (MitoFinder) annotation
2022-08-12 15:41:54 [INFO] m64204e_211125_011116-24707997-ccs circularization done. Circularization info saved on ./potential_contigs/m64204e_211125_011116-24707997-ccs/m64204e_211125_011116-24707997-ccs.circularisationCheck.txt
2022-08-12 15:41:54 [INFO] Started m64204e_211125_011116-24707997-ccs (MitoFinder) annotation
2022-08-12 15:41:54 [INFO] m64204e_211125_011116-166725359-ccs circularization done. Circularization info saved on ./potential_contigs/m64204e_211125_011116-166725359-ccs/m64204e_211125_011116-166725359-ccs.circularisationCheck.txt
2022-08-12 15:41:54 [INFO] Started m64204e_211125_011116-166725359-ccs (MitoFinder) annotation
2022-08-12 15:41:54 [INFO] m64204e_211125_011116-75366945-ccs circularization done. Circularization info saved on ./potential_contigs/m64204e_211125_011116-75366945-ccs/m64204e_211125_011116-75366945-ccs.circularisationCheck.txt
2022-08-12 15:41:54 [INFO] Started m64204e_211125_011116-75366945-ccs (MitoFinder) annotation
Traceback (most recent call last):
  File "/dss/dsshome1/lxc0C/ra35naf/miniconda3/envs/mito/bin/MitoHiFi/mitohifi.py", line 376, in <module>
    main()
  File "/dss/dsshome1/lxc0C/ra35naf/miniconda3/envs/mito/bin/MitoHiFi/mitohifi.py", line 262, in main
    tRNA_ref = fetch.get_ref_tRNA()
  File "/dss/dsshome1/lxc0C/ra35naf/miniconda3/envs/mito/bin/MitoHiFi/fetch.py", line 40, in get_ref_tRNA
    reference_tRNA = max(tRNAs, key=tRNAs.get)
ValueError: max() arg is an empty sequence
marcelauliano commented 2 years ago

Have a ding on the outputs. My guess is that you don’t have tRNA-Phe. So you probably only have a partial mito! 🤔

On Fri, 12 Aug 2022 at 14:56, Athena Syarifa @.***> wrote:

Ah, it worked! Up until circularization everything worked fine, but the same error appeared again when annotating even though I have deleted / or \ from my seq IDs.

2022-08-12 15:41:53 [INFO] 3. Filtering BLAST output to select target sequences 2022-08-12 15:41:53 [INFO] Filtering thresholds applied: 2022-08-12 15:41:53 [INFO] Minimum query percentage = 50 2022-08-12 15:41:53 [INFO] Minimum query length = 80% subject length 2022-08-12 15:41:53 [INFO] Maximum query length = 5 times subject length 2022-08-12 15:41:53 [INFO] Filtering BLAST finished. A list of the filtered contigs was saved on ./contigs_filtering/contigs_ids.txt file 2022-08-12 15:41:53 [INFO] 4. Now we are going to circularize, annotate and rotate each filtered contig. Those are potential mitogenome(s). 2022-08-12 15:41:54 [INFO] Working with contig m64204e_211125_011116-75366945-ccs 2022-08-12 15:41:54 [INFO] Working with contig m64204e_211125_011116-24707997-ccs 2022-08-12 15:41:54 [INFO] Working with contig m64204e_211125_011116-64488503-ccs 2022-08-12 15:41:54 [INFO] Working with contig m64204e_211125_011116-163972531-ccs 2022-08-12 15:41:54 [INFO] Working with contig m64204e_211125_011116-166725359-ccs 2022-08-12 15:41:54 [INFO] Started m64204e_211125_011116-75366945-ccs circularization 2022-08-12 15:41:54 [INFO] Started m64204e_211125_011116-24707997-ccs circularization 2022-08-12 15:41:54 [INFO] Started m64204e_211125_011116-64488503-ccs circularization 2022-08-12 15:41:54 [INFO] Started m64204e_211125_011116-163972531-ccs circularization 2022-08-12 15:41:54 [INFO] Started m64204e_211125_011116-166725359-ccs circularization 2022-08-12 15:41:54 [INFO] m64204e_211125_011116-64488503-ccs circularization done. Circularization info saved on ./potential_contigs/m64204e_211125_011116-64488503-ccs/m64204e_211125_011116-64488503-ccs.circularisationCheck.txt 2022-08-12 15:41:54 [INFO] Started m64204e_211125_011116-64488503-ccs (MitoFinder) annotation 2022-08-12 15:41:54 [INFO] m64204e_211125_011116-163972531-ccs circularization done. Circularization info saved on ./potential_contigs/m64204e_211125_011116-163972531-ccs/m64204e_211125_011116-163972531-ccs.circularisationCheck.txt 2022-08-12 15:41:54 [INFO] Started m64204e_211125_011116-163972531-ccs (MitoFinder) annotation 2022-08-12 15:41:54 [INFO] m64204e_211125_011116-24707997-ccs circularization done. Circularization info saved on ./potential_contigs/m64204e_211125_011116-24707997-ccs/m64204e_211125_011116-24707997-ccs.circularisationCheck.txt 2022-08-12 15:41:54 [INFO] Started m64204e_211125_011116-24707997-ccs (MitoFinder) annotation 2022-08-12 15:41:54 [INFO] m64204e_211125_011116-166725359-ccs circularization done. Circularization info saved on ./potential_contigs/m64204e_211125_011116-166725359-ccs/m64204e_211125_011116-166725359-ccs.circularisationCheck.txt 2022-08-12 15:41:54 [INFO] Started m64204e_211125_011116-166725359-ccs (MitoFinder) annotation 2022-08-12 15:41:54 [INFO] m64204e_211125_011116-75366945-ccs circularization done. Circularization info saved on ./potential_contigs/m64204e_211125_011116-75366945-ccs/m64204e_211125_011116-75366945-ccs.circularisationCheck.txt 2022-08-12 15:41:54 [INFO] Started m64204e_211125_011116-75366945-ccs (MitoFinder) annotation Traceback (most recent call last): File "/dss/dsshome1/lxc0C/ra35naf/miniconda3/envs/mito/bin/MitoHiFi/mitohifi.py", line 376, in main() File "/dss/dsshome1/lxc0C/ra35naf/miniconda3/envs/mito/bin/MitoHiFi/mitohifi.py", line 262, in main tRNA_ref = fetch.get_ref_tRNA() File "/dss/dsshome1/lxc0C/ra35naf/miniconda3/envs/mito/bin/MitoHiFi/fetch.py", line 40, in get_ref_tRNA reference_tRNA = max(tRNAs, key=tRNAs.get) ValueError: max() arg is an empty sequence

— Reply to this email directly, view it on GitHub https://github.com/marcelauliano/MitoHiFi/issues/24#issuecomment-1213138761, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA7M5RF2JNVTCUCQ7JA6MHLVYZJYRANCNFSM56IWFE2Q . You are receiving this because you commented.Message ID: @.***>

-- Marcela Uliano da Silva, PhD

Senior Bioinformatician - Wellcome Sanger Institute Darwin Tree of Life Project

Churchill College Cambridge By-Fellow

Cambridge, UK

athenasyarifa commented 2 years ago

Hmmm that is strange indeed. I will take a closer look into the matter. Many thanks for the help :)

athenasyarifa commented 2 years ago

Hi Marcela, I have succeeded in generating the final mitogenome. I reinstalled MitoHiFi and all its dependencies, and it worked. However, in the MitoFinder log, it says Evidences of circularization could not be found, but everyother step was successful, and it did not find any genes in the mtDNA aside from 8 tRNAs.

final_mitogenome.annotation_MitoFinder.log

And below is the circularisation check results (all false) all_contigs.circularisationCheck.txt

It seems that MitoFinder cannot find the circularisation, do you have any suggestions about this problem?

Many thanks for your help!

marcelauliano commented 2 years ago

Hi Athena! Your problem now is not the pipeline anymore. It’s possible you don’t have mitoreads in your sample. You need to investigate the intermediate files such as the filtered reads. Best of luck. If we can help with anymore more specific, let us know! On Wed, 14 Sep 2022 at 10:16, Athena Syarifa @.***> wrote:

Hi Marcela, I have succeeded in generating the final mitogenome. I reinstalled MitoHiFi and all its dependencies, and it worked. However, in the MitoFinder log, it says Evidences of circularization could not be found, but everyother step was successful, and it did not find any genes in the mtDNA aside from 8 tRNAs.

final_mitogenome.annotation_MitoFinder.log https://github.com/marcelauliano/MitoHiFi/files/9566571/final_mitogenome.annotation_MitoFinder.log

And below is the circularisation check results (all false) all_contigs.circularisationCheck.txt https://github.com/marcelauliano/MitoHiFi/files/9566589/all_contigs.circularisationCheck.txt

It seems that MitoFinder cannot find the circularisation, do you have any suggestions about this problem?

Many thanks for your help!

— Reply to this email directly, view it on GitHub https://github.com/marcelauliano/MitoHiFi/issues/24#issuecomment-1246750912, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA7M5REZLCCLMZ4UJXHGM5TV6HF3PANCNFSM56IWFE2Q . You are receiving this because you commented.Message ID: @.***>

-- Marcela Uliano da Silva, PhD

Senior Bioinformatician - Wellcome Sanger Institute Darwin Tree of Life Project

Churchill College Cambridge By-Fellow

Cambridge, UK

marcelauliano commented 2 years ago

Athena, did you input your reads and -r ? This is what you should do. From your intermediate files it seems to me you input them as -c

On Wed, 14 Sep 2022 at 13:37, Marcela Uliano da Silva < @.***> wrote:

Hi Athena! Your problem now is not the pipeline anymore. It’s possible you don’t have mitoreads in your sample. You need to investigate the intermediate files such as the filtered reads. Best of luck. If we can help with anymore more specific, let us know!

On Wed, 14 Sep 2022 at 10:16, Athena Syarifa @.***> wrote:

Hi Marcela, I have succeeded in generating the final mitogenome. I reinstalled MitoHiFi and all its dependencies, and it worked. However, in the MitoFinder log, it says Evidences of circularization could not be found, but everyother step was successful, and it did not find any genes in the mtDNA aside from 8 tRNAs.

final_mitogenome.annotation_MitoFinder.log https://github.com/marcelauliano/MitoHiFi/files/9566571/final_mitogenome.annotation_MitoFinder.log

And below is the circularisation check results (all false) all_contigs.circularisationCheck.txt https://github.com/marcelauliano/MitoHiFi/files/9566589/all_contigs.circularisationCheck.txt

It seems that MitoFinder cannot find the circularisation, do you have any suggestions about this problem?

Many thanks for your help!

— Reply to this email directly, view it on GitHub https://github.com/marcelauliano/MitoHiFi/issues/24#issuecomment-1246750912, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA7M5REZLCCLMZ4UJXHGM5TV6HF3PANCNFSM56IWFE2Q . You are receiving this because you commented.Message ID: @.***>

-- Marcela Uliano da Silva, PhD

Senior Bioinformatician - Wellcome Sanger Institute Darwin Tree of Life Project

Churchill College Cambridge By-Fellow

Cambridge, UK

-- Marcela Uliano da Silva, PhD

Senior Bioinformatician - Wellcome Sanger Institute Darwin Tree of Life Project

Churchill College Cambridge By-Fellow

Cambridge, UK