marcelauliano / MitoHiFi

Find, circularise and annotate mitogenome from PacBio assemblies
MIT License
169 stars 29 forks source link

HPC and slurm comparability - Error finding Contig tig00007550_1 does not have an annotation file #27

Open ChristophePatterson opened 2 years ago

ChristophePatterson commented 2 years ago

Hi,

Thanks for writing a great programme. I am sadly running into an issue which I have so far been unable to resolve or track down the precise cause. I am using MitoHifi on a HPC that uses the slurm. (specifically - https://www.dur.ac.uk/arc/hamilton/).

MitoHifi begins running on both the test data and my draft genome, however it fails when attempting step 4 (circularize, annotate and rotate each filtered contig).

The full output from mitohifi is as follows.

Looking for mitochondrion for Phalera bucephala
Mito for the same species is not found
Looking among close species
output is written to data/NC_066711.1.[gb,fasta]
2022-10-12 15:50:45 [INFO] Welcome to MitoHifi v2. Starting pipeline...
2022-10-12 15:50:45 [DEBUG] Running MitoHiFi on debug mode.
2022-10-12 15:50:45 [INFO] Length of related mitogenome is: 16668 bp
2022-10-12 15:50:45 [INFO] Number of genes on related mitogenome: 37
2022-10-12 15:50:45 [INFO] Running MitoHifi pipeline in contigs mode...
2022-10-12 15:50:45 [INFO] 1. Fixing potentially conflicting FASTA headers
2022-10-12 15:50:45 [INFO] 2. Let's run the blast of the contigs versus the close-related mitogenome
2022-10-12 15:50:45 [INFO] 2.1. Creating BLAST database:
2022-10-12 15:50:45 [INFO] makeblastdb -in data/NC_066711.1.fasta -dbtype nucl
2022-10-12 15:50:46 [INFO] Makeblastdb done.
2022-10-12 15:50:46 [INFO] 2.2. Running blast of contigs against close-related mitogenome:
2022-10-12 15:50:46 [INFO] blastn -query test.fa -db data/NC_066711.1.fasta -num_threads 1 -out contigs.blastn -outfmt 6 std qlen slen
2022-10-12 15:50:53 [INFO] Blast done.
2022-10-12 15:50:53 [INFO] 3. Filtering BLAST output to select target sequences
2022-10-12 15:50:53 [INFO] Filtering thresholds applied:
2022-10-12 15:50:53 [INFO] Minimum query percentage = 50
2022-10-12 15:50:53 [INFO] Minimum query length = 80% subject length
2022-10-12 15:50:53 [INFO] Maximum query length = 5 times subject length
2022-10-12 15:50:54 [INFO] Filtering BLAST finished. A list of the filtered contigs was saved on ./contigs_filtering/contigs_ids.txt file
2022-10-12 15:50:54 [INFO] 4. Now we are going to circularize, annotate and rotate each filtered contig. Those are potential mitogenome(s).
2022-10-12 15:50:54 [DEBUG] Threads per contig=1
2022-10-12 15:50:54 [DEBUG] Thresholds for circularization: circular size=220 | circular offset=40
2022-10-12 15:50:54 [DEBUG] Thresholds for annotation (MitoFinder): maximum contig size=83340
2022-10-12 15:50:54 [INFO] Working with contig tig00007550_1
2022-10-12 15:50:54 [INFO] Working with contig tig00007572_1
2022-10-12 15:50:54 [INFO] Started tig00007550_1 circularization
2022-10-12 15:50:54 [INFO] Started tig00007572_1 circularization
2022-10-12 15:50:55 [INFO] tig00007572_1 circularization done. Circularization info saved on ./potential_contigs/tig00007572_1/tig00007572_1.circularisationCheck.txt
2022-10-12 15:50:55 [INFO] Started tig00007572_1 (MitoFinder) annotation
2022-10-12 15:50:56 [INFO] tig00007550_1 circularization done. Circularization info saved on ./potential_contigs/tig00007550_1/tig00007550_1.circularisationCheck.txt
2022-10-12 15:50:56 [INFO] Started tig00007550_1 (MitoFinder) annotation
2022-10-12 15:50:56 [INFO] tig00007550_1 annotation done. Annotation log saved on ./potential_contigs/tig00007550_1/tig00007550_1.annotation_MitoFinder.log
/nobackup/dbl0hpc/apps/MitoHiFi/parallel_annotation.py:51: UserWarning: Contig tig00007550_1 does not have an annotation file, check MitoFinder's log
  warnings.warn("Contig "+ contig_id + " does not have an annotation file, check MitoFinder's log")
2022-10-12 15:50:56 [INFO] tig00007572_1 annotation done. Annotation log saved on ./potential_contigs/tig00007572_1/tig00007572_1.annotation_MitoFinder.log
/nobackup/dbl0hpc/apps/MitoHiFi/parallel_annotation.py:51: UserWarning: Contig tig00007572_1 does not have an annotation file, check MitoFinder's log
  warnings.warn("Contig "+ contig_id + " does not have an annotation file, check MitoFinder's log")
Traceback (most recent call last):
  File "/nobackup/dbl0hpc/apps/MitoHiFi/mitohifi.py", line 378, in <module>
    main()
  File "/nobackup/dbl0hpc/apps/MitoHiFi/mitohifi.py", line 264, in main
    tRNA_ref = fetch.get_ref_tRNA()
  File "/nobackup/dbl0hpc/apps/MitoHiFi/fetch.py", line 41, in get_ref_tRNA
    reference_tRNA = max(tRNAs, key=tRNAs.get)
ValueError: max() arg is an empty sequence

The line 2022-10-12 15:50:55 [INFO] tig00007572_1 circularization done. Circularization info saved on ./potential_contigs/tig00007572_1/tig00007572_1.circularisationCheck.txt states that the .circularisationCheck.txt file is created within the relative file path ./potential_contigs/tig00007572_1/. But no file path exists and the file can be found within the current working directory. The working directory upon termination of the programme is as follows.

contigs.blastn                                 tig00007550_1.mito.fa.nsq                      tig00007572_1.mito.fa.nhr
contigs_ids.txt                                tig00007550_1.mito.fa.ntf                      tig00007572_1.mito.fa.nin
data                                           tig00007550_1.mito.fa.nto                      tig00007572_1.mito.fa.njs
NC_016067.1.fasta                              tig00007550_1.mitogenome.fa                    tig00007572_1.mito.fa.not
NC_016067.1.gb                                 tig00007550_1.mitogenome.fa.ndb                tig00007572_1.mito.fa.nsq
parsed_blast_all.txt                           tig00007550_1.mitogenome.fa.nhr                tig00007572_1.mito.fa.ntf
parsed_blast.txt                               tig00007550_1.mitogenome.fa.nin                tig00007572_1.mito.fa.nto
test.fa                                        tig00007550_1.mitogenome.fa.njs                tig00007572_1.mitogenome.fa
tig00007550_1.circularisationCheck.txt         tig00007550_1.mitogenome.fa.not                tig00007572_1.mitogenome.fa.ndb
tig00007550_1.circularization_check.blast.tsv  tig00007550_1.mitogenome.fa.nsq                tig00007572_1.mitogenome.fa.nhr
tig00007550_1.mito.fa                          tig00007550_1.mitogenome.fa.ntf                tig00007572_1.mitogenome.fa.nin
tig00007550_1.mito.fa.ndb                      tig00007550_1.mitogenome.fa.nto                tig00007572_1.mitogenome.fa.njs
tig00007550_1.mito.fa.nhr                      tig00007572_1.circularisationCheck.txt         tig00007572_1.mitogenome.fa.not
tig00007550_1.mito.fa.nin                      tig00007572_1.circularization_check.blast.tsv  tig00007572_1.mitogenome.fa.nsq
tig00007550_1.mito.fa.njs                      tig00007572_1.mito.fa                          tig00007572_1.mitogenome.fa.ntf
tig00007550_1.mito.fa.not                      tig00007572_1.mito.fa.ndb

I'm also noting that the file contigs_ids.txt is also not found within the relative filepath ./contigs_filtering/contigs_ids.txt as stated in the above console output. I can also not locate any log file. I seem to be in the weird situation that mitohifi is not saving files at the relative file path stated, but then can't find those files when it searches for them.

Any advice or guidance would be greatly appreciated.

Many thanks,

Christophe

marcelauliano commented 2 years ago

Hi Christiophe,

This seems to me more a slurm setting up problem than a mitohifi problem. Have you run your installation with the test dataset outside slurm to see if it runs ok?

Best regards, M

Em qui., 13 de out. de 2022 às 06:31, ChristophePatterson < @.***> escreveu:

Hi,

Thanks for writing a great programme. I am sadly running into an issue which I have so far been unable to resolve or track down the precise cause. I am using MitoHifi on a HPC that uses the slurm. (specifically - https://www.dur.ac.uk/arc/hamilton/).

MitoHifi begins running on both the test data and my draft genome, however it fails when attempting step 4 (circularize, annotate and rotate each filtered contig).

The full output from mitohifi is as follows.

Looking for mitochondrion for Phalera bucephala Mito for the same species is not found Looking among close species output is written to data/NC_066711.1.[gb,fasta] 2022-10-12 15:50:45 [INFO] Welcome to MitoHifi v2. Starting pipeline... 2022-10-12 15:50:45 [DEBUG] Running MitoHiFi on debug mode. 2022-10-12 15:50:45 [INFO] Length of related mitogenome is: 16668 bp 2022-10-12 15:50:45 [INFO] Number of genes on related mitogenome: 37 2022-10-12 15:50:45 [INFO] Running MitoHifi pipeline in contigs mode... 2022-10-12 15:50:45 [INFO] 1. Fixing potentially conflicting FASTA headers 2022-10-12 15:50:45 [INFO] 2. Let's run the blast of the contigs versus the close-related mitogenome 2022-10-12 15:50:45 [INFO] 2.1. Creating BLAST database: 2022-10-12 15:50:45 [INFO] makeblastdb -in data/NC_066711.1.fasta -dbtype nucl 2022-10-12 15:50:46 [INFO] Makeblastdb done. 2022-10-12 15:50:46 [INFO] 2.2. Running blast of contigs against close-related mitogenome: 2022-10-12 15:50:46 [INFO] blastn -query test.fa -db data/NC_066711.1.fasta -num_threads 1 -out contigs.blastn -outfmt 6 std qlen slen 2022-10-12 15:50:53 [INFO] Blast done. 2022-10-12 15:50:53 [INFO] 3. Filtering BLAST output to select target sequences 2022-10-12 15:50:53 [INFO] Filtering thresholds applied: 2022-10-12 15:50:53 [INFO] Minimum query percentage = 50 2022-10-12 15:50:53 [INFO] Minimum query length = 80% subject length 2022-10-12 15:50:53 [INFO] Maximum query length = 5 times subject length 2022-10-12 15:50:54 [INFO] Filtering BLAST finished. A list of the filtered contigs was saved on ./contigs_filtering/contigs_ids.txt file 2022-10-12 15:50:54 [INFO] 4. Now we are going to circularize, annotate and rotate each filtered contig. Those are potential mitogenome(s). 2022-10-12 15:50:54 [DEBUG] Threads per contig=1 2022-10-12 15:50:54 [DEBUG] Thresholds for circularization: circular size=220 | circular offset=40 2022-10-12 15:50:54 [DEBUG] Thresholds for annotation (MitoFinder): maximum contig size=83340 2022-10-12 15:50:54 [INFO] Working with contig tig00007550_1 2022-10-12 15:50:54 [INFO] Working with contig tig00007572_1 2022-10-12 15:50:54 [INFO] Started tig00007550_1 circularization 2022-10-12 15:50:54 [INFO] Started tig00007572_1 circularization 2022-10-12 15:50:55 [INFO] tig00007572_1 circularization done. Circularization info saved on ./potential_contigs/tig00007572_1/tig00007572_1.circularisationCheck.txt 2022-10-12 15:50:55 [INFO] Started tig00007572_1 (MitoFinder) annotation 2022-10-12 15:50:56 [INFO] tig00007550_1 circularization done. Circularization info saved on ./potential_contigs/tig00007550_1/tig00007550_1.circularisationCheck.txt 2022-10-12 15:50:56 [INFO] Started tig00007550_1 (MitoFinder) annotation 2022-10-12 15:50:56 [INFO] tig00007550_1 annotation done. Annotation log saved on ./potential_contigs/tig00007550_1/tig00007550_1.annotation_MitoFinder.log /nobackup/dbl0hpc/apps/MitoHiFi/parallel_annotation.py:51: UserWarning: Contig tig00007550_1 does not have an annotation file, check MitoFinder's log warnings.warn("Contig "+ contig_id + " does not have an annotation file, check MitoFinder's log") 2022-10-12 15:50:56 [INFO] tig00007572_1 annotation done. Annotation log saved on ./potential_contigs/tig00007572_1/tig00007572_1.annotation_MitoFinder.log /nobackup/dbl0hpc/apps/MitoHiFi/parallel_annotation.py:51: UserWarning: Contig tig00007572_1 does not have an annotation file, check MitoFinder's log warnings.warn("Contig "+ contig_id + " does not have an annotation file, check MitoFinder's log") Traceback (most recent call last): File "/nobackup/dbl0hpc/apps/MitoHiFi/mitohifi.py", line 378, in main() File "/nobackup/dbl0hpc/apps/MitoHiFi/mitohifi.py", line 264, in main tRNA_ref = fetch.get_ref_tRNA() File "/nobackup/dbl0hpc/apps/MitoHiFi/fetch.py", line 41, in get_ref_tRNA reference_tRNA = max(tRNAs, key=tRNAs.get) ValueError: max() arg is an empty sequence

The line 2022-10-12 15:50:55 [INFO] tig00007572_1 circularization done. Circularization info saved on ./potential_contigs/tig00007572_1/tig00007572_1.circularisationCheck.txt states that the .circularisationCheck.txt file is created within the relative file path ./potential_contigs/tig00007572_1/. But no file path exists and the file can be found within the current working directory. The working directory upon termination of the programme is as follows.

contigs.blastn tig00007550_1.mito.fa.nsq tig00007572_1.mito.fa.nhr contigs_ids.txt tig00007550_1.mito.fa.ntf tig00007572_1.mito.fa.nin data tig00007550_1.mito.fa.nto tig00007572_1.mito.fa.njs NC_016067.1.fasta tig00007550_1.mitogenome.fa tig00007572_1.mito.fa.notNC_016067.1.gb tig00007550_1.mitogenome.fa.ndb tig00007572_1.mito.fa.nsq parsed_blast_all.txt tig00007550_1.mitogenome.fa.nhr tig00007572_1.mito.fa.ntf parsed_blast.txt tig00007550_1.mitogenome.fa.nin tig00007572_1.mito.fa.nto test.fa tig00007550_1.mitogenome.fa.njs tig00007572_1.mitogenome.fa tig00007550_1.circularisationCheck.txt tig00007550_1.mitogenome.fa.not tig00007572_1.mitogenome.fa.ndb tig00007550_1.circularization_check.blast.tsv tig00007550_1.mitogenome.fa.nsq tig00007572_1.mitogenome.fa.nhr tig00007550_1.mito.fa tig00007550_1.mitogenome.fa.ntf tig00007572_1.mitogenome.fa.nin tig00007550_1.mito.fa.ndb tig00007550_1.mitogenome.fa.nto tig00007572_1.mitogenome.fa.njs tig00007550_1.mito.fa.nhr tig00007572_1.circularisationCheck.txt tig00007572_1.mitogenome.fa.not tig00007550_1.mito.fa.nin tig00007572_1.circularization_check.blast.tsv tig00007572_1.mitogenome.fa.nsq tig00007550_1.mito.fa.njs tig00007572_1.mito.fa tig00007572_1.mitogenome.fa.ntf tig00007550_1.mito.fa.not tig00007572_1.mito.fa.ndb

I'm also noting that the file contigs_ids.txt is also not found within the relative filepath ./contigs_filtering/contigs_ids.txt as stated in the above console output. I can also not locate any log file. I seem to be in the weird situation that mitohifi is not saving files at the relative file path stated, but then can't find those files when it searches for them.

Any advice or guidance would be greatly appreciated.

Many thanks,

Christophe

— Reply to this email directly, view it on GitHub https://github.com/marcelauliano/MitoHiFi/issues/27, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA7M5RBUUX7CMD6KCNY3SQDWC7JG7ANCNFSM6AAAAAARECQCNY . You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- Marcela Uliano da Silva, PhD

Senior Bioinformatician - Wellcome Sanger Institute Darwin Tree of Life Project

Churchill College Cambridge By-Fellow

Cambridge, UK

ChristophePatterson commented 2 years ago

Hi Marcela,

Thanks I'm chatting with the folks who run the HPC (a meeting tomorrow) so hopefully if this is a slurm issue we can resolve it. However, I ran the code directly on the console (outside of slurm) and have arrived at the same error.

Kind regards, Christophe

marcelauliano commented 2 years ago

Hi christophe,

Are you running inside a singularity or with your own installation? Did you run with the test dataset?

Em qui., 13 de out. de 2022 às 08:07, ChristophePatterson < @.***> escreveu:

Hi Marcela,

Thanks I'm chatting with the folks who run the HPC (a meeting tomorrow) so hopefully if this is a slurm issue we can resolve it. However, I ran the code directly on the console (outside of slurm) and have arrived at the same error.

Kind regards, Christophe

— Reply to this email directly, view it on GitHub https://github.com/marcelauliano/MitoHiFi/issues/27#issuecomment-1277435900, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA7M5RHBF5SGUBYJYUJEP3TWC7UQRANCNFSM6AAAAAARECQCNY . You are receiving this because you commented.Message ID: @.***>

-- Marcela Uliano da Silva, PhD

Senior Bioinformatician - Wellcome Sanger Institute Darwin Tree of Life Project

Churchill College Cambridge By-Fellow

Cambridge, UK

ChristophePatterson commented 1 year ago

Hi Marcela,

Apologies for the delay. I have managed to get mitohifi v2.2 working on the HPC I have access to. For others that may be in a similar position, I am attaching my code below. The route of this cause seems to be something to do with mitohifi defaulting to writing the file output to $HOME directory and changing this to another directory proved challenging. Potentially adding an -outfolder to the mitohifi.py pipeline (similar to that seen in the findMitoReference.py) may be of benefit. I was also struggling to locate the exact cause of the error because the stderr=subprocess.DEVNULL was included within the mitohifi.py code. Specifically for myself because mitohifi was searching for the fasta files in $HOME (regardless of anything I did) minimap2 was not finding the specified fasta file. This would have outputted an error normally but when running this through mitohifi it continues on with no error and crashes later in the Hifiasm step because there were no mapped reads.

This may be specific to my HPC set up but if anyone else is using a slurm based system, I have successfully run mitohifi using the following code.

# Open an interactive slurm job
srun -t 8:00:00 -c 24 --mem=50G --gres=tmp:50G --pty bash

# Open an interactive singularity image. For myself it was essential to have both --bind and --home. Even if they were the same
singularity run --bind /path/to/hifireads/ --home /path/to/output/directory /path/to/singularity/mitohifi_2.2_cv1.sif bash

# Run mitohifi within the singularity image
mitohifi.py -r hifi_reads.fastq -f fasta_file.fasta -g gb_file.gb -t 24 -o 1

Kind regards,

Christophe