marcelauliano / MitoHiFi

Find, circularise and annotate mitogenome from PacBio assemblies
MIT License
169 stars 29 forks source link

No output after successfully running the pipeline #91

Open Ruiqi-CUB opened 6 months ago

Ruiqi-CUB commented 6 months ago

Hi Marcela! I am trying to run MitHiFi with the testing data, the log file shows the pipeline is done, but no files were written to my working directory except for output files (.nhr, .nin, .nsq) from makeblastdb. Would you mind helping me have a look? Thank you very much!

Ruiqi

I ran the following command in this directory: /home/ruiqi/ruiqi_data/mitoGenome/MitoHiFI I downloaded the reference fasta and gb manually from GenBank and out them in a sub directory: ./Test

Here is the command I run: nohup docker run --rm -v /home/ruiqi/ruiqi_data/mitoGenome/MitoHiFI:/home/ruiqi/ruiqi_data/mitoGenome/MitoHiFI ghcr.io/marcelauliano/mitohifi:master python3 /opt/MitoHiFi/src/mitohifi.py -r /opt/MitoHiFi/tests/ilDeiPorc1.reads.100.fa -f /home/ruiqi/ruiqi_data/mitoGenome/MitoHiFI/Test/OQ694980.1.fasta -g /home/ruiqi/ruiqi_data/mitoGenome/MitoHiFI/Test/OQ694980.1.gb -t 20 -o 5&

Here are the files after running the pipeline `$ls * nohup.out

SerGro_PacBio: SRR22795536_1.fastq SRR23931733_1.fastq

Test: OQ694980.1.fasta OQ694980.1.fasta.nhr OQ694980.1.fasta.nin OQ694980.1.fasta.nsq OQ694980.1.gb`

Here is the log file: `2024-05-20 06:03:29 [INFO] Welcome to MitoHifi v2. Starting pipeline... 2024-05-20 06:03:29 [INFO] Length of related mitogenome is: 15372 bp 2024-05-20 06:03:29 [INFO] Number of genes on related mitogenome: 37 2024-05-20 06:03:29 [INFO] Running MitoHifi pipeline in reads mode... 2024-05-20 06:03:29 [INFO] 1. First we map your Pacbio HiFi reads to the close-related mitogenome 2024-05-20 06:03:29 [INFO] minimap2 -t 20 --secondary=no -ax map-hifi /home/ruiqi/ruiqi_data/mitoGenome/MitoHiFI/Test/OQ694980.1.fasta /opt/MitoHiFi/tests/ilDeiPorc1.reads.100.fa | samtools view -@ 20 -b -F4 -F 0x800 -o reads.HiFiMapped.bam 2024-05-20 06:03:29 [INFO] 2. Now we filter out any mapped reads that are larger than the reference mitogenome to avoid NUMTS 2024-05-20 06:03:29 [INFO] 2.1 First we convert the mapped reads from BAM to FASTA format: 2024-05-20 06:03:29 [INFO] samtools fasta reads.HiFiMapped.bam > gbk.HiFiMapped.bam.fasta 2024-05-20 06:03:29 [INFO] Total number of mapped reads: 97 2024-05-20 06:03:29 [INFO] 2.2 Then we filter reads that are larger than 15372 bp 2024-05-20 06:03:29 [INFO] Number of filtered reads: 97 2024-05-20 06:03:29 [INFO] 3. Now let's run hifiasm to assemble the mapped and filtered reads! 2024-05-20 06:03:29 [INFO] hifiasm --primary -t 20 -f 0 -o gbk.HiFiMapped.bam.filtered.assembled gbk.HiFiMapped.bam.filtered.fasta cat: gbk.HiFiMapped.bam.filtered.assembled.a_ctg.fa: No such file or directory 2024-05-20 06:03:30 [INFO] 4. Let's run the blast of the contigs versus the close-related mitogenome 2024-05-20 06:03:30 [INFO] 4.1. Creating BLAST database: 2024-05-20 06:03:30 [INFO] makeblastdb -in /home/ruiqi/ruiqi_data/mitoGenome/MitoHiFI/Test/OQ694980.1.fasta -dbtype nucl 2024-05-20 06:03:30 [INFO] Makeblastdb done. 2024-05-20 06:03:30 [INFO] 4.2. Running blast of contigs against close-related mitogenome: 2024-05-20 06:03:30 [INFO] blastn -query hifiasm.contigs.fasta -db /home/ruiqi/ruiqi_data/mitoGenome/MitoHiFI/Test/OQ694980.1.fasta -num_threads 20 -out contigs.blastn -outfmt 6 std qlen slen 2024-05-20 06:03:30 [INFO] Blast done. 2024-05-20 06:03:30 [INFO] 5. Filtering BLAST output to select target sequences 2024-05-20 06:03:30 [INFO] Filtering thresholds applied: 2024-05-20 06:03:30 [INFO] Minimum query percentage = 50 2024-05-20 06:03:30 [INFO] Minimum query length = 80% subject length 2024-05-20 06:03:30 [INFO] Maximum query length = 5 times subject length 2024-05-20 06:03:30 [INFO] Filtering BLAST finished. A list of the filtered contigs was saved on ./contigs_filtering/contigs_ids.txt file 2024-05-20 06:03:30 [INFO] 6. Now we are going to circularize, annotate and rotate each filtered contig. Those are potential mitogenome(s). 2024-05-20 06:03:30 [INFO] Annotation will be done using MitoFinder (default) 2024-05-20 06:03:30 [INFO] Working with contig ptg000001l 2024-05-20 06:03:30 [INFO] Started ptg000001l circularization 2024-05-20 06:03:30 [INFO] ptg000001l circularization done. Circularization info saved on ./potential_contigs/ptg000001l/ptg000001l.circularisationCheck.txt 2024-05-20 06:03:30 [INFO] Started ptg000001l (MitoFinder) annotation 2024-05-20 06:05:36 [INFO] ptg000001l annotation done. Annotation log saved on ./potential_contigs/ptg000001l/ptg000001l.annotation_MitoFinder.log 2024-05-20 06:05:36 [INFO] Started ptg000001l rotation. 2024-05-20 06:05:36 [INFO] tRNA-Phe is at reverse complement of ptg000001l.mitogenome.fa 2024-05-20 06:05:36 [INFO] For that reason we'll reverse complement ptg000001l.mitogenome.fa before the rotation 2024-05-20 06:05:36 [INFO] Reverse complementation completed for contig ptg000001l done 2024-05-20 06:05:36 [INFO] Rotation of ptg000001l done. Rotated is at ptg000001l.mitogenome.rotated.fa 2024-05-20 06:05:36 [INFO] Started calculating mitocontig stats... ptg000001l 2024-05-20 06:05:36 [INFO] 7. Now the rotated contigs will be aligned 2024-05-20 06:05:36 [INFO] List of contigs that will be aligned: ['ptg000001l.mitogenome.rotated.fa']

2024-05-20 06:05:36 [INFO] MAFFT alignment will be called with: mafft --quiet --clustalout --thread 20 all_mitogenomes.rotated.fa > all_mitogenomes.rotated.aligned.aln

2024-05-20 06:05:36 [INFO] Alignment done and saved at ./final_mitogenome_choice/all_mitogenomes.rotated.aligned.aln

2024-05-20 06:05:36 [INFO] 8. Now we will choose the most representative contig

2024-05-20 06:05:38 [INFO] Representative contig is ptg000001l that belongs to Cluster 0. This contig will be our final mitogenome. See all contigs and clusters in cdhit.out.clstr 2024-05-20 06:05:38 [INFO] 9. Calculating final stats for final mitogenome and other potential contigs. Stats will be saved on contigs_stats.tsv file. ptg000001l list of genes: ['tRNA-Phe', 'tRNA-Glu', 'tRNA-Ser2', 'tRNA-Asn', 'tRNA-Arg', 'tRNA-Ala', 'ND3', 'tRNA-Gly', 'COX3', 'ATP6', 'tRNA-Asp', 'tRNA-Lys', 'COX2', 'tRNA-Leu2', 'COX1', 'tRNA-Tyr', 'tRNA-Cys', 'tRNA-Trp', 'ND2', 'tRNA-Gln', 'tRNA-Ile', 'tRNA-Met', 'rrnS', 'tRNA-Val', 'rrnL', 'tRNA-Leu', 'ND1', 'tRNA-Ser', 'CYTB', 'ND6', 'tRNA-Pro', 'tRNA-Thr', 'ND4L', 'ND4', 'tRNA-His', 'ND5'] 2024-05-20 06:05:38 [INFO] 10. Building annotation plots for all contigs 2024-05-20 06:05:38 [INFO] 11. Building coverage distribution for each potential contig 2024-05-20 06:05:38 [INFO] contigs_to_map: ['final_mitogenome.fasta', 'ptg000001l.mitogenome.rotated.fa'] 2024-05-20 06:05:38 [INFO] 11.1 Mapping HiFi (filtered) reads against potential contigs: 2024-05-20 06:05:38 [INFO] Reads mapping: 2024-05-20 06:05:38 [INFO] minimap2 -t 20 --secondary=no -ax map-pb all_potential_contigs.fa gbk.HiFiMapped.bam.fasta | samtools view -@ 20 -b -F4 -F 0x800 -q 0 -o HiFi-vs-potential_contigs.bam [W::sam_hdr_parse] Duplicated sequence 'ptg000001l_rc_rotated' 2024-05-20 06:05:38 [INFO] Sorting mapping file: 2024-05-20 06:05:38 [INFO] samtools sort -@ 20 HiFi-vs-potential_contigs.bam -o HiFi-vs-potential_contigs.sorted.bam 2024-05-20 06:05:38 [INFO] Indexing sorted mapping file: 2024-05-20 06:05:38 [INFO] samtools index HiFi-vs-potential_contigs.sorted.bam 2024-05-20 06:05:38 [INFO] HiFi reads mapping done. Output file: HiFi-vs-potential_contigs.sorted.bam 2024-05-20 06:05:38 [INFO] Retrieve BAM for contig ptg000001l: 2024-05-20 06:05:38 [INFO] samtools view -@ 20 -b -o ptg000001l.bam HiFi-vs-potential_contigs.sorted.bam ptg000001l_rc_rotated 2024-05-20 06:05:38 [INFO] Sorting mapping file: 2024-05-20 06:05:38 [INFO] samtools sort -@ 20 ptg000001l.bam -o ptg000001l.sorted.bam 2024-05-20 06:05:38 [INFO] Indexing sorted mapping file: 2024-05-20 06:05:38 [INFO] samtools index ptg000001l.sorted.bam 2024-05-20 06:05:38 [INFO] Splitting mapping file done. Individual mapped contigs: ['ptg000001l'] 2024-05-20 06:05:38 [INFO] 11.2 Creating coverage plot... 2024-05-20 06:05:39 [INFO] Coverage plots created. 2024-05-20 06:05:39 [INFO] 12. Building coverage distribution for final mitogenome 2024-05-20 06:05:39 [INFO] 12.1 Mapping HiFi (filtered) reads against final_mitogenome.fasta: 2024-05-20 06:05:39 [INFO] Reads mapping: 2024-05-20 06:05:39 [INFO] minimap2 -t 20 --secondary=no -ax map-pb final_mitogenome.fasta gbk.HiFiMapped.bam.fasta | samtools view -@ 20 -b -F4 -F 0x800 -q 0 -o HiFi-vs-final_mitogenome.bam 2024-05-20 06:05:39 [INFO] Sorting mapping file: 2024-05-20 06:05:39 [INFO] samtools sort -@ 20 HiFi-vs-final_mitogenome.bam -o HiFi-vs-final_mitogenome.sorted.bam 2024-05-20 06:05:39 [INFO] Indexing sorted mapping file: 2024-05-20 06:05:39 [INFO] samtools index HiFi-vs-final_mitogenome.sorted.bam 2024-05-20 06:05:39 [INFO] HiFi reads mapping done. Output file: HiFi-vs-final_mitogenome.sorted.bam 2024-05-20 06:05:39 [INFO] 12.2 Creating coverage plot... 2024-05-20 06:05:39 [INFO] Coverage plot for final_mitogenome created... 2024-05-20 06:05:39 [INFO] Pipeline finished! 2024-05-20 06:05:39 [INFO] Run time: 130.48 seconds`

Ruiqi-CUB commented 6 months ago

Update: I run it in the interactive docker image and output files show up. Do you happen to know what could be the possible issue:

image
docker run --rm -it -v /home/ruiqi/ruiqi_data/mitoGenome/MitoHiFI:/home/ruiqi/ruiqi_data/mitoGenome/MitoHiFI ghcr.io/marcelauliano/mitohifi:master /bin/bash

python3 /opt/MitoHiFi/src/mitohifi.py -r /opt/MitoHiFi/tests/ilDeiPorc1.reads.100.fa -f /home/ruiqi/ruiqi_data/mitoGenome/MitoHiFI/Test/OQ694980.1.fasta -g /home/ruiqi/ruiqi_data/mitoGenome/MitoHiFI/Test/OQ694980.1.gb -t 20 -o 5