phac-nml / mob-suite

MOB-suite: Software tools for clustering, reconstruction and typing of plasmids from draft assemblies
Apache License 2.0
118 stars 31 forks source link

Missing Blast database #107

Closed cchauve closed 2 years ago

cchauve commented 2 years ago

When running mob_recon 3.0.3, I obtain the following log

2022-05-05 12:22:23,279 mob_suite.mob_recon INFO: MOB-recon version 3.0.3 [in /project/6069942/PLASMIDS/ARETE_MAY22/hyasp_env/lib/python3.6/site-packages/mob_suite-3.0.3-py3.6.egg/mob_suite/mob_recon.py:983] ... 2022-05-05 12:22:44,936 mob_suite.mob_recon INFO: Writing cleaned header input fasta file from /home/chauvec/projects/ctb-chauvec/jsiele/results/e_coli/short_read_assembly/e_coli_055_assembly/assembly.fasta to /home/chauvec/projects/ctb-chauvec/PLASMIDS/ARETE_MAY22/exp/e_coli_mob/results/e_coli_055/tmp/fixed.input.fasta [in /project/6069942/PLASMIDS/ARETE_MAY22/hyasp_env/lib/python3.6/site-packages/mob_suite-3.0.3-py3.6.egg/mob_suite/mob_recon.py:1105] 2022-05-05 12:22:46,982 root INFO: Blasting replicon sequences /project/6069942/PLASMIDS/ARETE_MAY22/hyasp_env/lib/python3.6/site-packages/mob_suite-3.0.3-py3.6.egg/mob_suite/databases/rep.dna.fas against /home/chauvec/projects/ctb-chauvec/PLASMIDS/ARETE_MAY22/exp/e_coli_mob/results/e_coli_055/tmp/fixed.input.fasta [in /project/6069942/PLASMIDS/ARETE_MAY22/hyasp_env/lib/python3.6/site-packages/mob_suite-3.0.3-py3.6.egg/mob_suite/utils.py:1141] 2022-05-05 12:22:47,195 root ERROR: blastn on db /project/6069942/PLASMIDS/ARETE_MAY22/hyasp_env/lib/python3.6/site-packages/mob_suite-3.0.3-py3.6.egg/mob_suite/databases/rep.dna.fas and query /home/chauvec/projects/ctb-chauvec/PLASMIDS/ARETE_MAY22/exp/e_coli_mob/results/e_coli_055/__tmp/fixed.input.fasta STDERR: b'BLAST Database error: No alias or index file found for nucleotide database [/home/chauvec/projects/ctb-chauvec/PLASMIDS/ARETE_MAY22/exp/e_coli_mob/results/e_coli_055/tmp/fixed.input.fasta] in search path [::]\n' [in /project/6069942/PLASMIDS/ARETE_MAY22/hyasp_env/lib/python3.6/site-packages/mob_suite-3.0.3-py3.6.egg/mob_suite/blast/init__.py:162] ...

I do not understand as the blast database for file /home/chauvec/projects/ctb-chauvec/PLASMIDS/ARETE_MAY22/exp/e_coli_mob/results/e_coli_055/__tmp/fixed.input.fasta does exist:

ll /home/chauvec/projects/ctb-chauvec/PLASMIDS/ARETE_MAY22/exp/e_coli_mob/results/e_coli_055/tmp/fixed.input.fasta* -rw-r--r-- 1 chauvec ctb-chauvec 4932915 May 5 12:22 /home/chauvec/projects/ctb-chauvec/PLASMIDS/ARETE_MAY22/exp/e_coli_mob/results/e_coli_055/__tmp/fixed.input.fasta -rw-r--r-- 1 chauvec ctb-chauvec 20480 May 5 12:22 /home/chauvec/projects/ctb-chauvec/PLASMIDS/ARETE_MAY22/exp/e_coli_mob/results/e_coli_055/tmp/fixed.input.fasta.ndb -rw-r--r-- 1 chauvec ctb-chauvec 9345 May 5 12:22 /home/chauvec/projects/ctb-chauvec/PLASMIDS/ARETE_MAY22/exp/e_coli_mob/results/e_coli_055/tmp/fixed.input.fasta.nhr -rw-r--r-- 1 chauvec ctb-chauvec 1472 May 5 12:22 /home/chauvec/projects/ctb-chauvec/PLASMIDS/ARETE_MAY22/exp/e_coli_mob/results/e_coli_055/__tmp/fixed.input.fasta.nin -rw-r--r-- 1 chauvec ctb-chauvec 1268 May 5 12:22 /home/chauvec/projects/ctb-chauvec/PLASMIDS/ARETE_MAY22/exp/e_coli_mob/results/e_coli_055/tmp/fixed.input.fasta.not -rw-r--r-- 1 chauvec ctb-chauvec 1232538 May 5 12:22 /home/chauvec/projects/ctb-chauvec/PLASMIDS/ARETE_MAY22/exp/e_coli_mob/results/e_coli_055/tmp/fixed.input.fasta.nsq -rw-r--r-- 1 chauvec ctb-chauvec 16384 May 5 12:22 /home/chauvec/projects/ctb-chauvec/PLASMIDS/ARETE_MAY22/exp/e_coli_mob/results/e_coli_055/__tmp/fixed.input.fasta.ntf -rw-r--r-- 1 chauvec ctb-chauvec 424 May 5 12:22 /home/chauvec/projects/ctb-chauvec/PLASMIDS/ARETE_MAY22/exp/e_coli_mob/results/e_coli_055/tmp/fixed.input.fasta.nto

kbessonov1984 commented 2 years ago

Hi Cedric, I think you are trying to run MOB-recon against your custom plasmid database? Can you paste a full command that you are trying to run? The error you are seeing seems to be related to input parameters. The input should be a fasta file, not a BLAST database. Your fixed.input.fasta file contains sequences of various draft plasmids spanning several contigs? Typically the MOB-recon is run as mob-recon -t -c -i fixed.input.fasta -o output_dir. If you want to build a custom plasmid database you need to use mob-cluster tool.

cchauve commented 2 years ago

I am not running MOB-recon against my own database. I have ran mob_init and want to use MOB-recon against its own database. I do not want to type the plasmids aor detect circular contigs, so the full command I run is

FASTA=/home/chauvec/projects/ctb-chauvec/jsiele/results/e_coli/short_read_assembly/e_coli_055_assembly/assembly.fasta OUT=/home/chauvec/projects/ctb-chauvec/PLASMIDS/ARETE_MAY22/exp/e_coli_mob/results/e_coli_055 mob_recon -i ${FASTA} -o ${OUT} -f -k --debug

kbessonov1984 commented 2 years ago

The command mob_recon -i ${FASTA} -o ${OUT} -f -k --debug should work fine. I expect some issues with the fasta header. Can you send us that assembly.fasta file via email and will try to run it ourselves? Seems like a makeblastdb issue

kbessonov1984 commented 2 years ago

Morning Cederic, I had no issues running your assembly on MOB-suite version 3.0.3. Perhaps there is an issue with the database initialization. Try running mob_init again and checking that you have all databases intact.

I am attaching results from the mob_recon run coupled to plasmid typing available in this archive results.zip.

I successfully run both mob_recon -i cederic_assembly.fasta -fk -o mob-suite2 and mob_recon -i cederic_assembly.fasta -ut -o mob-suite. Note the -t flag allows to run mob_typer and produce a complete output. The -u flag allows to check circularity based on unicycler annotations of the fasta headers i.e. circular=true

Here is my log file

(mob_suite-3.0.3) [kbessono@waffles-j-1 cedric]$ mob_recon -i cederic_assembly.fasta -ut -o mob-suite
2022-05-06 06:21:52,963 mob_suite.mob_recon INFO: MOB-recon version 3.0.3  [in /home/CSCScience.ca/kbessono/.conda/envs/mob_suite-3.0.3/lib/python3.8/site-packages/mob_suite/mob_recon.py:983]
2022-05-06 06:21:52,965 mob_suite.mob_recon INFO: SUCCESS: Found program blastn at /home/CSCScience.ca/kbessono/.conda/envs/mob_suite-3.0.3/bin/blastn [in /home/CSCScience.ca/kbessono/.conda/envs/mob_suite-3.0.3/lib/python3.8/site-packages/mob_suite/utils.py:590]
2022-05-06 06:21:52,966 mob_suite.mob_recon INFO: SUCCESS: Found program makeblastdb at /home/CSCScience.ca/kbessono/.conda/envs/mob_suite-3.0.3/bin/makeblastdb [in /home/CSCScience.ca/kbessono/.conda/envs/mob_suite-3.0.3/lib/python3.8/site-packages/mob_suite/utils.py:590]
2022-05-06 06:21:52,968 mob_suite.mob_recon INFO: SUCCESS: Found program tblastn at /home/CSCScience.ca/kbessono/.conda/envs/mob_suite-3.0.3/bin/tblastn [in /home/CSCScience.ca/kbessono/.conda/envs/mob_suite-3.0.3/lib/python3.8/site-packages/mob_suite/utils.py:590]
2022-05-06 06:21:52,968 mob_suite.mob_recon INFO: Processing fasta file cederic_assembly.fasta [in /home/CSCScience.ca/kbessono/.conda/envs/mob_suite-3.0.3/lib/python3.8/site-packages/mob_suite/mob_recon.py:1004]
2022-05-06 06:21:52,968 mob_suite.mob_recon INFO: Analysis directory mob-suite [in /home/CSCScience.ca/kbessono/.conda/envs/mob_suite-3.0.3/lib/python3.8/site-packages/mob_suite/mob_recon.py:1005]
2022-05-06 06:21:52,977 root INFO: Creating Lock file /home/CSCScience.ca/kbessono/.conda/envs/mob_suite-3.0.3/lib/python3.8/site-packages/mob_suite/databases/ETE3_DB.lock [in /home/CSCScience.ca/kbessono/.conda/envs/mob_suite-3.0.3/lib/python3.8/site-packages/mob_suite/utils.py:450]
2022-05-06 06:21:52,980 root INFO: Testing ETE3 taxonomy db /Drives/K/kbessono/.conda/envs/mob_suite-3.0.3/lib/python3.8/site-packages/mob_suite/databases/taxa.sqlite [in /home/CSCScience.ca/kbessono/.conda/envs/mob_suite-3.0.3/lib/python3.8/site-packages/mob_suite/utils.py:459]
2022-05-06 06:21:53,004 root INFO: Lock file removed. [in /home/CSCScience.ca/kbessono/.conda/envs/mob_suite-3.0.3/lib/python3.8/site-packages/mob_suite/utils.py:466]
2022-05-06 06:21:59,811 mob_suite.mob_recon INFO: Writing cleaned header input fasta file from cederic_assembly.fasta to mob-suite/__tmp/fixed.input.fasta [in /home/CSCScience.ca/kbessono/.conda/envs/mob_suite-3.0.3/lib/python3.8/site-packages/mob_suite/mob_recon.py:1105]
2022-05-06 06:22:00,690 root INFO: Blasting replicon sequences /Drives/K/kbessono/.conda/envs/mob_suite-3.0.3/lib/python3.8/site-packages/mob_suite/databases/rep.dna.fas against mob-suite/__tmp/fixed.input.fasta [in /home/CSCScience.ca/kbessono/.conda/envs/mob_suite-3.0.3/lib/python3.8/site-packages/mob_suite/utils.py:1141]
2022-05-06 06:22:01,832 root INFO: Filtering replicon blast results mob-suite/__tmp/replicon_blast_results.txt  [in /home/CSCScience.ca/kbessono/.conda/envs/mob_suite-3.0.3/lib/python3.8/site-packages/mob_suite/utils.py:1146]
2022-05-06 06:22:01,894 root INFO: Blasting relaxase sequences /Drives/K/kbessono/.conda/envs/mob_suite-3.0.3/lib/python3.8/site-packages/mob_suite/databases/mob.proteins.faa against mob-suite/__tmp/fixed.input.fasta [in /home/CSCScience.ca/kbessono/.conda/envs/mob_suite-3.0.3/lib/python3.8/site-packages/mob_suite/utils.py:1164]
2022-05-06 06:22:58,744 root INFO: Filtering relaxase blast results mob-suite/__tmp/mob_blast_results.txt  [in /home/CSCScience.ca/kbessono/.conda/envs/mob_suite-3.0.3/lib/python3.8/site-packages/mob_suite/utils.py:1169]
2022-05-06 06:22:58,799 root INFO: Blasting MPF sequences /Drives/K/kbessono/.conda/envs/mob_suite-3.0.3/lib/python3.8/site-packages/mob_suite/databases/mpf.proteins.faa against mob-suite/__tmp/fixed.input.fasta [in /home/CSCScience.ca/kbessono/.conda/envs/mob_suite-3.0.3/lib/python3.8/site-packages/mob_suite/utils.py:1188]
2022-05-06 06:23:43,173 root INFO: Filtering MPF blast results mob-suite/__tmp/mpf_blast_results.txt  [in /home/CSCScience.ca/kbessono/.conda/envs/mob_suite-3.0.3/lib/python3.8/site-packages/mob_suite/utils.py:1203]
2022-05-06 06:23:43,175 root INFO: Blasting orit sequences /Drives/K/kbessono/.conda/envs/mob_suite-3.0.3/lib/python3.8/site-packages/mob_suite/databases/orit.fas against mob-suite/__tmp/fixed.input.fasta [in /home/CSCScience.ca/kbessono/.conda/envs/mob_suite-3.0.3/lib/python3.8/site-packages/mob_suite/utils.py:1212]
2022-05-06 06:23:43,922 root INFO: Filtering orit blast results mob-suite/__tmp/orit_blast_results.txt  [in /home/CSCScience.ca/kbessono/.conda/envs/mob_suite-3.0.3/lib/python3.8/site-packages/mob_suite/utils.py:1217]
2022-05-06 06:23:44,032 root INFO: Blasting contigs against repetitive sequences db: /Drives/K/kbessono/.conda/envs/mob_suite-3.0.3/lib/python3.8/site-packages/mob_suite/databases/repetitive.dna.fas [in /home/CSCScience.ca/kbessono/.conda/envs/mob_suite-3.0.3/lib/python3.8/site-packages/mob_suite/utils.py:1234]
2022-05-06 06:23:45,250 root INFO: Filtering repetitive blast results mob-suite/__tmp/repetitive_blast_results.txt  [in /home/CSCScience.ca/kbessono/.conda/envs/mob_suite-3.0.3/lib/python3.8/site-packages/mob_suite/utils.py:1238]
2022-05-06 06:23:45,277 root INFO: Filtering contig: 35_length=1178_depth=9.29x due to repetitive sequence [in /home/CSCScience.ca/kbessono/.conda/envs/mob_suite-3.0.3/lib/python3.8/site-packages/mob_suite/utils.py:1253]
2022-05-06 06:23:45,278 root INFO: Blasting contigs against reference sequence db: /Drives/K/kbessono/.conda/envs/mob_suite-3.0.3/lib/python3.8/site-packages/mob_suite/databases/ncbi_plasmid_full_seqs.fas [in /home/CSCScience.ca/kbessono/.conda/envs/mob_suite-3.0.3/lib/python3.8/site-packages/mob_suite/mob_recon.py:1261]
2022-05-06 06:24:18,679 root INFO: Filtering contig blast results: mob-suite/__tmp/contig_blast_results.txt [in /home/CSCScience.ca/kbessono/.conda/envs/mob_suite-3.0.3/lib/python3.8/site-packages/mob_suite/mob_recon.py:1266]
2022-05-06 06:24:18,707 root INFO: Assigning contigs to plasmid groups [in /home/CSCScience.ca/kbessono/.conda/envs/mob_suite-3.0.3/lib/python3.8/site-packages/mob_suite/mob_recon.py:1281]
2022-05-06 06:24:23,212 root INFO: Writting contig results to mob-suite/contig_report.txt [in /home/CSCScience.ca/kbessono/.conda/envs/mob_suite-3.0.3/lib/python3.8/site-packages/mob_suite/mob_recon.py:1353]
2022-05-06 06:24:27,760 root INFO: Writting plasmid sequences to mob-suite/plasmid_AB448.fasta [in /home/CSCScience.ca/kbessono/.conda/envs/mob_suite-3.0.3/lib/python3.8/site-packages/mob_suite/utils.py:948]
2022-05-06 06:24:32,433 root INFO: Writting plasmid sequences to mob-suite/plasmid_AB239.fasta [in /home/CSCScience.ca/kbessono/.conda/envs/mob_suite-3.0.3/lib/python3.8/site-packages/mob_suite/utils.py:948]
2022-05-06 06:24:33,602 root INFO: Writting chromosome sequences to mob-suite/chromosome.fasta [in /home/CSCScience.ca/kbessono/.conda/envs/mob_suite-3.0.3/lib/python3.8/site-packages/mob_suite/mob_recon.py:1368]
2022-05-06 06:24:33,631 root INFO: Cleaning up temporary files mob-suite/__tmp [in /home/CSCScience.ca/kbessono/.conda/envs/mob_suite-3.0.3/lib/python3.8/site-packages/mob_suite/mob_recon.py:1376]
2022-05-06 06:24:33,678 mob_suite.mob_recon INFO: MOB-recon completed and results written to mob-suite [in /home/CSCScience.ca/kbessono/.conda/envs/mob_suite-3.0.3/lib/python3.8/site-packages/mob_suite/mob_recon.py:1378]

Here is my databases folder structure at .../lib/python3.8/site-packages/mob_suite/databases/. Specifically pay attention to rep.dna.fas file size as it contains replicon sequences and based on your log that is the step that mob-recon fails at.

-rwxrwxr-x. 1 kbessono grp_kbessono 3.8M Dec  8 22:06 clusters.txt
-rwxrwxr-x. 1 kbessono grp_kbessono  37K Dec  8 22:07 host_range_literature_plasmidDB.txt
-rwxrwxr-x. 2 kbessono grp_kbessono    0 Aug  4  2021 __init__.py
-rwxrwxr-x. 1 kbessono grp_kbessono 1.4M Dec  8 22:07 mob.proteins.faa
-rwxrwxr-x. 1 kbessono grp_kbessono 997K Dec  8 22:07 mpf.proteins.faa
-rwxrwxr-x. 1 kbessono grp_kbessono 1.6G Dec  8 22:07 ncbi_plasmid_full_seqs.fas
-rwxrwxr-x. 1 kbessono grp_kbessono 184M Dec  8 22:07 ncbi_plasmid_full_seqs.fas.msh
-rwxrwxr-x. 1 kbessono grp_kbessono 556K Dec  8 22:07 ncbi_plasmid_full_seqs.fas.ndb
-rwxrwxr-x. 1 kbessono grp_kbessono 1.2M Dec  8 22:07 ncbi_plasmid_full_seqs.fas.nhr
-rwxrwxr-x. 1 kbessono grp_kbessono 278K Dec  8 22:07 ncbi_plasmid_full_seqs.fas.nin
-rwxrwxr-x. 1 kbessono grp_kbessono  93K Dec  8 22:07 ncbi_plasmid_full_seqs.fas.nog
-rwxrwxr-x. 1 kbessono grp_kbessono 397K Dec  8 22:07 ncbi_plasmid_full_seqs.fas.nos
-rwxrwxr-x. 1 kbessono grp_kbessono 278K Dec  8 22:07 ncbi_plasmid_full_seqs.fas.not
-rwxrwxr-x. 1 kbessono grp_kbessono 392M Dec  8 22:07 ncbi_plasmid_full_seqs.fas.nsq
-rwxrwxr-x. 1 kbessono grp_kbessono  16K Dec  8 22:07 ncbi_plasmid_full_seqs.fas.ntf
-rwxrwxr-x. 1 kbessono grp_kbessono  93K Dec  8 22:07 ncbi_plasmid_full_seqs.fas.nto
-rwxrwxr-x. 1 kbessono grp_kbessono 133K Dec  8 22:07 orit.fas
-rwxrwxr-x. 1 kbessono grp_kbessono 2.6M Dec  8 22:06 rep.dna.fas
-rwxrwxr-x. 1 kbessono grp_kbessono 4.7M Dec  8 22:07 repetitive.dna.fas
-rwxrwxr-x. 1 kbessono grp_kbessono  20K Dec  8 22:07 repetitive.dna.fas.ndb
-rwxrwxr-x. 1 kbessono grp_kbessono  78K Dec  8 22:07 repetitive.dna.fas.nhr
-rwxrwxr-x. 1 kbessono grp_kbessono  11K Dec  8 22:07 repetitive.dna.fas.nin
-rwxrwxr-x. 1 kbessono grp_kbessono  11K Dec  8 22:07 repetitive.dna.fas.not
-rwxrwxr-x. 1 kbessono grp_kbessono 1.2M Dec  8 22:07 repetitive.dna.fas.nsq
-rwxrwxr-x. 1 kbessono grp_kbessono  16K Dec  8 22:07 repetitive.dna.fas.ntf
-rwxrwxr-x. 1 kbessono grp_kbessono 3.5K Dec  8 22:07 repetitive.dna.fas.nto
-rwxrwxr-x. 1 kbessono grp_kbessono   46 Dec  8 22:09 status.txt
-rwxrwxr-x. 1 kbessono grp_kbessono 548M Dec  8 22:09 taxa.sqlite
cchauve commented 2 years ago

Hello Kirill

I tried to process my sample on the cedar computecanada cluster, running both mob_init then mob_recon. As previously it failed, and I attach a file containing both the SLURM script and the output.

At this point unfortunately, I feel that I am not able to run MOB-suite on cedar.

If you have an idea of what is going on on cedar, this will be great, otherwise I will have to leave MOB-suite results out of the analysis I am running.

Regards

cedric


From: Kirill Bessonov @.***> Sent: May 6, 2022 05:00 To: phac-nml/mob-suite Cc: Computational Methods for Comparative Genomics at Simon Fraser University; Author Subject: Re: [phac-nml/mob-suite] Missing Blast database (Issue #107)

Morning Cederic, I had no issues running your assembly on MOB-suite version 3.0.3. Perhaps there is an issue with the database initialization. Try running mob_init again and checking that you have all databases intact. I am attaching results from the mob_recon run coupled to plasmid typing available in this archive results.ziphttps://github.com/phac-nml/mob-suite/files/8639643/results.zip.

I successfully run both mob_recon -i cederic_assembly.fasta -fk -o mob-suite2 and mob_recon -i cederic_assembly.fasta -ut -o mob-suite. Note the -t flag allows to run mob_typer and produce a complete output. The -u flag allows to check circularity based on unicycler annotations of the fasta headers i.e. circular=true

Here is my log file

(mob_suite-3.0.3) @.*** cedric]$ mob_recon -i cederic_assembly.fasta -ut -o mob-suite 2022-05-06 06:21:52,963 mob_suite.mob_recon INFO: MOB-recon version 3.0.3 [in /home/CSCScience.ca/kbessono/.conda/envs/mob_suite-3.0.3/lib/python3.8/site-packages/mob_suite/mob_recon.py:983] 2022-05-06 06:21:52,965 mob_suite.mob_recon INFO: SUCCESS: Found program blastn at /home/CSCScience.ca/kbessono/.conda/envs/mob_suite-3.0.3/bin/blastn [in /home/CSCScience.ca/kbessono/.conda/envs/mob_suite-3.0.3/lib/python3.8/site-packages/mob_suite/utils.py:590] 2022-05-06 06:21:52,966 mob_suite.mob_recon INFO: SUCCESS: Found program makeblastdb at /home/CSCScience.ca/kbessono/.conda/envs/mob_suite-3.0.3/bin/makeblastdb [in /home/CSCScience.ca/kbessono/.conda/envs/mob_suite-3.0.3/lib/python3.8/site-packages/mob_suite/utils.py:590] 2022-05-06 06:21:52,968 mob_suite.mob_recon INFO: SUCCESS: Found program tblastn at /home/CSCScience.ca/kbessono/.conda/envs/mob_suite-3.0.3/bin/tblastn [in /home/CSCScience.ca/kbessono/.conda/envs/mob_suite-3.0.3/lib/python3.8/site-packages/mob_suite/utils.py:590] 2022-05-06 06:21:52,968 mob_suite.mob_recon INFO: Processing fasta file cederic_assembly.fasta [in /home/CSCScience.ca/kbessono/.conda/envs/mob_suite-3.0.3/lib/python3.8/site-packages/mob_suite/mob_recon.py:1004] 2022-05-06 06:21:52,968 mob_suite.mob_recon INFO: Analysis directory mob-suite [in /home/CSCScience.ca/kbessono/.conda/envs/mob_suite-3.0.3/lib/python3.8/site-packages/mob_suite/mob_recon.py:1005] 2022-05-06 06:21:52,977 root INFO: Creating Lock file /home/CSCScience.ca/kbessono/.conda/envs/mob_suite-3.0.3/lib/python3.8/site-packages/mob_suite/databases/ETE3_DB.lock [in /home/CSCScience.ca/kbessono/.conda/envs/mob_suite-3.0.3/lib/python3.8/site-packages/mob_suite/utils.py:450] 2022-05-06 06:21:52,980 root INFO: Testing ETE3 taxonomy db /Drives/K/kbessono/.conda/envs/mob_suite-3.0.3/lib/python3.8/site-packages/mob_suite/databases/taxa.sqlite [in /home/CSCScience.ca/kbessono/.conda/envs/mob_suite-3.0.3/lib/python3.8/site-packages/mob_suite/utils.py:459] 2022-05-06 06:21:53,004 root INFO: Lock file removed. [in /home/CSCScience.ca/kbessono/.conda/envs/mob_suite-3.0.3/lib/python3.8/site-packages/mob_suite/utils.py:466] 2022-05-06 06:21:59,811 mob_suite.mob_recon INFO: Writing cleaned header input fasta file from cederic_assembly.fasta to mob-suite/tmp/fixed.input.fasta [in /home/CSCScience.ca/kbessono/.conda/envs/mob_suite-3.0.3/lib/python3.8/site-packages/mob_suite/mob_recon.py:1105] 2022-05-06 06:22:00,690 root INFO: Blasting replicon sequences /Drives/K/kbessono/.conda/envs/mob_suite-3.0.3/lib/python3.8/site-packages/mob_suite/databases/rep.dna.fas against mob-suite/tmp/fixed.input.fasta [in /home/CSCScience.ca/kbessono/.conda/envs/mob_suite-3.0.3/lib/python3.8/site-packages/mob_suite/utils.py:1141] 2022-05-06 06:22:01,832 root INFO: Filtering replicon blast results mob-suite/tmp/replicon_blast_results.txt [in /home/CSCScience.ca/kbessono/.conda/envs/mob_suite-3.0.3/lib/python3.8/site-packages/mob_suite/utils.py:1146] 2022-05-06 06:22:01,894 root INFO: Blasting relaxase sequences /Drives/K/kbessono/.conda/envs/mob_suite-3.0.3/lib/python3.8/site-packages/mob_suite/databases/mob.proteins.faa against mob-suite/tmp/fixed.input.fasta [in /home/CSCScience.ca/kbessono/.conda/envs/mob_suite-3.0.3/lib/python3.8/site-packages/mob_suite/utils.py:1164] 2022-05-06 06:22:58,744 root INFO: Filtering relaxase blast results mob-suite/tmp/mob_blast_results.txt [in /home/CSCScience.ca/kbessono/.conda/envs/mob_suite-3.0.3/lib/python3.8/site-packages/mob_suite/utils.py:1169] 2022-05-06 06:22:58,799 root INFO: Blasting MPF sequences /Drives/K/kbessono/.conda/envs/mob_suite-3.0.3/lib/python3.8/site-packages/mob_suite/databases/mpf.proteins.faa against mob-suite/tmp/fixed.input.fasta [in /home/CSCScience.ca/kbessono/.conda/envs/mob_suite-3.0.3/lib/python3.8/site-packages/mob_suite/utils.py:1188] 2022-05-06 06:23:43,173 root INFO: Filtering MPF blast results mob-suite/tmp/mpf_blast_results.txt [in /home/CSCScience.ca/kbessono/.conda/envs/mob_suite-3.0.3/lib/python3.8/site-packages/mob_suite/utils.py:1203] 2022-05-06 06:23:43,175 root INFO: Blasting orit sequences /Drives/K/kbessono/.conda/envs/mob_suite-3.0.3/lib/python3.8/site-packages/mob_suite/databases/orit.fas against mob-suite/tmp/fixed.input.fasta [in /home/CSCScience.ca/kbessono/.conda/envs/mob_suite-3.0.3/lib/python3.8/site-packages/mob_suite/utils.py:1212] 2022-05-06 06:23:43,922 root INFO: Filtering orit blast results mob-suite/tmp/orit_blast_results.txt [in /home/CSCScience.ca/kbessono/.conda/envs/mob_suite-3.0.3/lib/python3.8/site-packages/mob_suite/utils.py:1217] 2022-05-06 06:23:44,032 root INFO: Blasting contigs against repetitive sequences db: /Drives/K/kbessono/.conda/envs/mob_suite-3.0.3/lib/python3.8/site-packages/mob_suite/databases/repetitive.dna.fas [in /home/CSCScience.ca/kbessono/.conda/envs/mob_suite-3.0.3/lib/python3.8/site-packages/mob_suite/utils.py:1234] 2022-05-06 06:23:45,250 root INFO: Filtering repetitive blast results mob-suite/tmp/repetitive_blast_results.txt [in /home/CSCScience.ca/kbessono/.conda/envs/mob_suite-3.0.3/lib/python3.8/site-packages/mob_suite/utils.py:1238] 2022-05-06 06:23:45,277 root INFO: Filtering contig: 35_length=1178_depth=9.29x due to repetitive sequence [in /home/CSCScience.ca/kbessono/.conda/envs/mob_suite-3.0.3/lib/python3.8/site-packages/mob_suite/utils.py:1253] 2022-05-06 06:23:45,278 root INFO: Blasting contigs against reference sequence db: /Drives/K/kbessono/.conda/envs/mob_suite-3.0.3/lib/python3.8/site-packages/mob_suite/databases/ncbi_plasmid_full_seqs.fas [in /home/CSCScience.ca/kbessono/.conda/envs/mob_suite-3.0.3/lib/python3.8/site-packages/mob_suite/mob_recon.py:1261] 2022-05-06 06:24:18,679 root INFO: Filtering contig blast results: mob-suite/tmp/contig_blast_results.txt [in /home/CSCScience.ca/kbessono/.conda/envs/mob_suite-3.0.3/lib/python3.8/site-packages/mob_suite/mob_recon.py:1266] 2022-05-06 06:24:18,707 root INFO: Assigning contigs to plasmid groups [in /home/CSCScience.ca/kbessono/.conda/envs/mob_suite-3.0.3/lib/python3.8/site-packages/mob_suite/mob_recon.py:1281] 2022-05-06 06:24:23,212 root INFO: Writting contig results to mob-suite/contig_report.txt [in /home/CSCScience.ca/kbessono/.conda/envs/mob_suite-3.0.3/lib/python3.8/site-packages/mob_suite/mob_recon.py:1353] 2022-05-06 06:24:27,760 root INFO: Writting plasmid sequences to mob-suite/plasmid_AB448.fasta [in /home/CSCScience.ca/kbessono/.conda/envs/mob_suite-3.0.3/lib/python3.8/site-packages/mob_suite/utils.py:948] 2022-05-06 06:24:32,433 root INFO: Writting plasmid sequences to mob-suite/plasmid_AB239.fasta [in /home/CSCScience.ca/kbessono/.conda/envs/mob_suite-3.0.3/lib/python3.8/site-packages/mob_suite/utils.py:948] 2022-05-06 06:24:33,602 root INFO: Writting chromosome sequences to mob-suite/chromosome.fasta [in /home/CSCScience.ca/kbessono/.conda/envs/mob_suite-3.0.3/lib/python3.8/site-packages/mob_suite/mob_recon.py:1368] 2022-05-06 06:24:33,631 root INFO: Cleaning up temporary files mob-suite/tmp [in /home/CSCScience.ca/kbessono/.conda/envs/mob_suite-3.0.3/lib/python3.8/site-packages/mob_suite/mob_recon.py:1376] 2022-05-06 06:24:33,678 mob_suite.mob_recon INFO: MOB-recon completed and results written to mob-suite [in /home/CSCScience.ca/kbessono/.conda/envs/mob_suite-3.0.3/lib/python3.8/site-packages/mob_suite/mob_recon.py:1378]

Here is my databases folder structure at .../lib/python3.8/site-packages/mob_suite/databases/. Specifically pay attention to rep.dna.fas file size as it contains replicon sequences and based on your log that is the step that mob-recon fails at.

-rwxrwxr-x. 1 kbessono grp_kbessono 3.8M Dec 8 22:06 clusters.txt -rwxrwxr-x. 1 kbessono grp_kbessono 37K Dec 8 22:07 host_range_literature_plasmidDB.txt -rwxrwxr-x. 2 kbessono grp_kbessono 0 Aug 4 2021 init.py -rwxrwxr-x. 1 kbessono grp_kbessono 1.4M Dec 8 22:07 mob.proteins.faa -rwxrwxr-x. 1 kbessono grp_kbessono 997K Dec 8 22:07 mpf.proteins.faa -rwxrwxr-x. 1 kbessono grp_kbessono 1.6G Dec 8 22:07 ncbi_plasmid_full_seqs.fas -rwxrwxr-x. 1 kbessono grp_kbessono 184M Dec 8 22:07 ncbi_plasmid_full_seqs.fas.msh -rwxrwxr-x. 1 kbessono grp_kbessono 556K Dec 8 22:07 ncbi_plasmid_full_seqs.fas.ndb -rwxrwxr-x. 1 kbessono grp_kbessono 1.2M Dec 8 22:07 ncbi_plasmid_full_seqs.fas.nhr -rwxrwxr-x. 1 kbessono grp_kbessono 278K Dec 8 22:07 ncbi_plasmid_full_seqs.fas.nin -rwxrwxr-x. 1 kbessono grp_kbessono 93K Dec 8 22:07 ncbi_plasmid_full_seqs.fas.nog -rwxrwxr-x. 1 kbessono grp_kbessono 397K Dec 8 22:07 ncbi_plasmid_full_seqs.fas.nos -rwxrwxr-x. 1 kbessono grp_kbessono 278K Dec 8 22:07 ncbi_plasmid_full_seqs.fas.not -rwxrwxr-x. 1 kbessono grp_kbessono 392M Dec 8 22:07 ncbi_plasmid_full_seqs.fas.nsq -rwxrwxr-x. 1 kbessono grp_kbessono 16K Dec 8 22:07 ncbi_plasmid_full_seqs.fas.ntf -rwxrwxr-x. 1 kbessono grp_kbessono 93K Dec 8 22:07 ncbi_plasmid_full_seqs.fas.nto -rwxrwxr-x. 1 kbessono grp_kbessono 133K Dec 8 22:07 orit.fas -rwxrwxr-x. 1 kbessono grp_kbessono 2.6M Dec 8 22:06 rep.dna.fas -rwxrwxr-x. 1 kbessono grp_kbessono 4.7M Dec 8 22:07 repetitive.dna.fas -rwxrwxr-x. 1 kbessono grp_kbessono 20K Dec 8 22:07 repetitive.dna.fas.ndb -rwxrwxr-x. 1 kbessono grp_kbessono 78K Dec 8 22:07 repetitive.dna.fas.nhr -rwxrwxr-x. 1 kbessono grp_kbessono 11K Dec 8 22:07 repetitive.dna.fas.nin -rwxrwxr-x. 1 kbessono grp_kbessono 11K Dec 8 22:07 repetitive.dna.fas.not -rwxrwxr-x. 1 kbessono grp_kbessono 1.2M Dec 8 22:07 repetitive.dna.fas.nsq -rwxrwxr-x. 1 kbessono grp_kbessono 16K Dec 8 22:07 repetitive.dna.fas.ntf -rwxrwxr-x. 1 kbessono grp_kbessono 3.5K Dec 8 22:07 repetitive.dna.fas.nto -rwxrwxr-x. 1 kbessono grp_kbessono 46 Dec 8 22:09 status.txt -rwxrwxr-x. 1 kbessono grp_kbessono 548M Dec 8 22:09 taxa.sqlite

— Reply to this email directly, view it on GitHubhttps://github.com/phac-nml/mob-suite/issues/107#issuecomment-1119527350, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACXV7DQSEHJDCZ7APSWRQRLVIUAMNANCNFSM5VGBQKGQ. You are receiving this because you authored the thread.Message ID: @.***>

SLURM SCRIPT

!/bin/bash

SBATCH --time=8:00:00

SBATCH --mem=8G

SBATCH --account=def-chauvec

source /home/chauvec/projects/ctb-chauvec/PLASMIDS/ARETE_MAY22/hyasp_env/bin/activate module load StdEnv/2020 gcc/9.3.0 blast+/2.12.0 mash mob_init makeblastdb -in /project/6069942/PLASMIDS/ARETE_MAY22/hyasp_env/lib/python3.6/site-packages/mob_suite-3.0.3-py3.6.egg/mob_suite/databases/rep.dna.fas -dbtype nucl makeblastdb -in /project/6069942/PLASMIDS/ARETE_MAY22/hyasp_env/lib/python3.6/site-packages/mob_suite-3.0.3-py3.6.egg/mob_suite/databases/orit.fas -dbtype nucl

Sample contigs sequences

FASTA=/home/chauvec/projects/ctb-chauvec/jsiele/results/e_coli/short_read_assembly/e_coli_055_assembly/assembly.fasta

Output directory

OUT=/home/chauvec/projects/ctb-chauvec/PLASMIDS/ARETE_MAY22/exp/test_run/e_coli_055 mkdir -p ${OUT} cd ${OUT} mob_recon -i ${FASTA} -o ${OUT} -fkut

RUNNING MOB_INIT

2022-05-06 10:15:45,390 mob_suite.utils INFO: Database directory folder already exists at /project/6069942/PLASMIDS/ARETE_MAY22/hyasp_env/lib/python3.6/site -packages/mob_suite-3.0.3-py3.6.egg/mob_suite/databases [in /project/6069942/PLASMIDS/ARETE_MAY22/hyasp_env/lib/python3.6/site-packages/mob_suite-3.0.3-py3.6.egg/mob_suite/mob_init.py:131] 2022-05-06 10:15:45,413 mob_suite.utils INFO: Placed lock file at /project/6069942/PLASMIDS/ARETE_MAY22/hyasp_env/lib/python3.6/site-packages/mob_suite-3.0.3-py3.6.egg/mob_suite/databases/.lock [in /project/6069942/PLASMIDS/ARETE_MAY22/hyasp_env/lib/python3.6/site-packages/mob_suite-3.0.3-py3.6.egg/mob_suite/mob_init.py:142] 2022-05-06 10:15:45,413 mob_suite.utils INFO: Initializing databases...this will take some time [in /project/6069942/PLASMIDS/ARETE_MAY22/hyasp_env/lib/python3.6/site-packages/mob_suite-3.0.3-py3.6.egg/mob_suite/mob_init.py:165] 2022-05-06 10:15:45,413 mob_suite.utils INFO: Downloading databases...this will take some time [in /project/6069942/PLASMIDS/ARETE_MAY22/hyasp_env/lib/python3.6/site-packages/mob_suite-3.0.3-py3.6.egg/mob_suite/mob_init.py:178] 2022-05-06 10:15:45,413 mob_suite.utils INFO: Trying mirror https://zenodo.org/record/3786915/files/data.tar.gz?download=1 [in /project/6069942/PLASMIDS/ARETE_MAY22/hyasp_env/lib/python3.6/site-packages/mob_suite-3.0.3-py3.6.egg/mob_suite/mob_init.py:182] % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed

2022-05-06 10:16:50,726 mob_suite.utils INFO: Downloading databases successful, now building databases at /project/6069942/PLASMIDS/ARETE_MAY22/hyasp_env/lib/python3.6/site-packages/mob_suite-3.0.3-py3.6.egg/mob_suite/databases [in /project/6069942/PLASMIDS/ARETE_MAY22/hyasp_env/lib/python3.6/site-packages/mob_suite-3.0.3-py3.6.egg/mob_suite/mob_init.py:191] 2022-05-06 10:16:50,727 mob_suite.utils INFO: Decompressing /project/6069942/PLASMIDS/ARETE_MAY22/hyasp_env/lib/python3.6/site-packages/mob_suite-3.0.3-py3.6.egg/mob_suite/databases/data.tar.gz [in /project/6069942/PLASMIDS/ARETE_MAY22/hyasp_env/lib/python3.6/site-packages/mob_suite-3.0.3-py3.6.egg/mob_suite/mob_init.py:107] 2022-05-06 10:17:12,937 mob_suite.utils INFO: Building repetitive mask database [in /project/6069942/PLASMIDS/ARETE_MAY22/hyasp_env/lib/python3.6/site-packages/mob_suite-3.0.3-py3.6.egg/mob_suite/mob_init.py:204] 2022-05-06 10:17:15,979 mob_suite.utils INFO: Building complete plasmid database [in /project/6069942/PLASMIDS/ARETE_MAY22/hyasp_env/lib/python3.6/site-packages/mob_suite-3.0.3-py3.6.egg/mob_suite/mob_init.py:208] 2022-05-06 10:17:39,957 mob_suite.utils INFO: Sketching complete plasmid database [in /project/6069942/PLASMIDS/ARETE_MAY22/hyasp_env/lib/python3.6/site-packages/mob_suite-3.0.3-py3.6.egg/mob_suite/mob_init.py:212] 2022-05-06 10:19:02,871 mob_suite.utils INFO: Init ete3 library ... [in /project/6069942/PLASMIDS/ARETE_MAY22/hyasp_env/lib/python3.6/site-packages/mob_suite-3.0.3-py3.6.egg/mob_suite/mob_init.py:224] Downloading taxdump.tar.gz from NCBI FTP site (via HTTP)... Done. Parsing...

2022-05-06 10:28:46,414 mob_suite.utils INFO: MOB init completed successfully [in /project/6069942/PLASMIDS/ARETE_MAY22/hyasp_env/lib/python3.6/site-packages/mob_suite-3.0.3-py3.6.egg/mob_suite/mob_init.py:250]

CREATING BLAST DB FOR rep.dna.fas AND orit.fas (NECESSARY?)

Loading node names... 2418466 names loaded. 275167 synonyms loaded. Loading nodes... 2418466 nodes loaded. Linking nodes... Tree is loaded. Updating database: /project/6069942/PLASMIDS/ARETE_MAY22/hyasp_env/lib/python3.6/site-packages/mob_suite-3.0.3-py3.6.egg/mob_suite/databases/taxa.sqlite ...

Uploading to /project/6069942/PLASMIDS/ARETE_MAY22/hyasp_env/lib/python3.6/site-packages/mob_suite-3.0.3-py3.6.egg/mob_suite/databases/taxa.sqlite

Building a new DB, current time: 05/06/2022 10:28:58 New DB name: /project/6069942/PLASMIDS/ARETE_MAY22/hyasp_env/lib/python3.6/site-packages/mob_suite-3.0.3-py3.6.egg/mob_suite/databases/rep.dna.fas New DB title: /project/6069942/PLASMIDS/ARETE_MAY22/hyasp_env/lib/python3.6/site-packages/mob_suite-3.0.3-py3.6.egg/mob_suite/databases/rep.dna.fas Sequence type: Nucleotide Deleted existing Nucleotide BLAST database named /project/6069942/PLASMIDS/ARETE_MAY22/hyasp_env/lib/python3.6/site-packages/mob_suite-3.0.3-py3.6.egg/mob_suite/databases/rep.dna.fas Keep MBits: T Maximum file size: 1000000000B Adding sequences from FASTA; added 2686 sequences in 0.422154 seconds.

Building a new DB, current time: 05/06/2022 10:29:06 New DB name: /project/6069942/PLASMIDS/ARETE_MAY22/hyasp_env/lib/python3.6/site-packages/mob_suite-3.0.3-py3.6.egg/mob_suite/databases/orit.fas New DB title: /project/6069942/PLASMIDS/ARETE_MAY22/hyasp_env/lib/python3.6/site-packages/mob_suite-3.0.3-py3.6.egg/mob_suite/databases/orit.fas Sequence type: Nucleotide Deleted existing Nucleotide BLAST database named /project/6069942/PLASMIDS/ARETE_MAY22/hyasp_env/lib/python3.6/site-packages/mob_suite-3.0.3-py3.6.egg/mob_suite/databases/orit.fas Keep MBits: T Maximum file size: 1000000000B Adding sequences from FASTA; added 502 sequences in 0.0604968 seconds.

RUNING MOB_RECON

2022-05-06 10:29:36,424 mob_suite.mob_recon INFO: MOB-recon version 3.0.3 [in /project/6069942/PLASMIDS/ARETE_MAY22/hyasp_env/lib/python3.6/site-packages/mob_suite-3.0.3-py3.6.egg/mob_suite/mob_recon.py:983] 2022-05-06 10:29:36,425 mob_suite.mob_recon INFO: SUCCESS: Found program blastn at /cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Compiler/gcc9/blast+/2.12.0/bin/blastn [in /project/6069942/PLASMIDS/ARETE_MAY22/hyasp_env/lib/python3.6/site-packages/mob_suite-3.0.3-py3.6.egg/mob_suite/utils.py:590] 2022-05-06 10:29:36,426 mob_suite.mob_recon INFO: SUCCESS: Found program makeblastdb at /cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Compiler/gcc9/blast+/2.12.0/bin/makeblastdb [in /project/6069942/PLASMIDS/ARETE_MAY22/hyasp_env/lib/python3.6/site-packages/mob_suite-3.0.3-py3.6.egg/mob_suite/utils.py:590] 2022-05-06 10:29:36,427 mob_suite.mob_recon INFO: SUCCESS: Found program tblastn at /cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Compiler/gcc9/blast+/2.12.0/bin/tblastn [in /project/6069942/PLASMIDS/ARETE_MAY22/hyasp_env/lib/python3.6/site-packages/mob_suite-3.0.3-py3.6.egg/mob_suite/utils.py:590] 2022-05-06 10:29:36,492 mob_suite.mob_recon INFO: Processing fasta file /home/chauvec/projects/ctb-chauvec/jsiele/results/e_coli/short_read_assembly/e_coli_055_assembly/assembly.fasta [in /project/6069942/PLASMIDS/ARETE_MAY22/hyasp_env/lib/python3.6/site-packages/mob_suite-3.0.3-py3.6.egg/mob_suite/mob_recon.py:1004] 2022-05-06 10:29:36,492 mob_suite.mob_recon INFO: Analysis directory /home/chauvec/projects/ctb-chauvec/PLASMIDS/ARETE_MAY22/exp/test_run/e_coli_055 [in /project/6069942/PLASMIDS/ARETE_MAY22/hyasp_env/lib/python3.6/site-packages/mob_suite-3.0.3-py3.6.egg/mob_suite/mob_recon.py:1005] 2022-05-06 10:29:36,498 root INFO: Creating Lock file /project/6069942/PLASMIDS/ARETE_MAY22/hyasp_env/lib/python3.6/site-packages/mob_suite-3.0.3-py3.6.egg/mob_suite/databases/ETE3_DB.lock [in /project/6069942/PLASMIDS/ARETE_MAY22/hyasp_env/lib/python3.6/site-packages/mob_suite-3.0.3-py3.6.egg/mob_suite/utils.py:450] 2022-05-06 10:29:36,500 root INFO: Testing ETE3 taxonomy db /project/6069942/PLASMIDS/ARETE_MAY22/hyasp_env/lib/python3.6/site-packages/mob_suite-3.0.3-py3.6.egg/mob_suite/databases/taxa.sqlite [in /project/6069942/PLASMIDS/ARETE_MAY22/hyasp_env/lib/python3.6/site-packages/mob_suite-3.0.3-py3.6.egg/mob_suite/utils.py:459] 2022-05-06 10:29:36,583 root INFO: Lock file removed. [in /project/6069942/PLASMIDS/ARETE_MAY22/hyasp_env/lib/python3.6/site-packages/mob_suite-3.0.3-py3.6.egg/mob_suite/utils.py:466] 2022-05-06 10:29:50,794 mob_suite.mob_recon INFO: Writing cleaned header input fasta file from /home/chauvec/projects/ctb-chauvec/jsiele/results/e_coli/short_read_assembly/e_coli_055_assembly/assembly.fasta to /home/chauvec/projects/ctb-chauvec/PLASMIDS/ARETE_MAY22/exp/test_run/e_coli_055/tmp/fixed.input.fasta [in /project/6069942/PLASMIDS/ARETE_MAY22/hyasp_env/lib/python3.6/site-packages/mob_suite-3.0.3-py3.6.egg/mob_suite/mob_recon.py:1105] 2022-05-06 10:30:01,079 root INFO: Blasting replicon sequences /project/6069942/PLASMIDS/ARETE_MAY22/hyasp_env/lib/python3.6/site-packages/mob_suite-3.0.3-py3.6.egg/mob_suite/databases/rep.dna.fas against /home/chauvec/projects/ctb-chauvec/PLASMIDS/ARETE_MAY22/exp/test_run/e_coli_055/tmp/fixed.input.fasta [in /project/6069942/PLASMIDS/ARETE_MAY22/hyasp_env/lib/python3.6/site-packages/mob_suite-3.0.3-py3.6.egg/mob_suite/utils.py:1141] 2022-05-06 10:30:01,535 root ERROR: blastn on db /project/6069942/PLASMIDS/ARETE_MAY22/hyasp_env/lib/python3.6/site-packages/mob_suite-3.0.3-py3.6.egg/mob_suite/databases/rep.dna.fas and query /home/chauvec/projects/ctb-chauvec/PLASMIDS/ARETE_MAY22/exp/test_run/e_coli_055/tmp/fixed.input.fasta STDERR: b'BLAST Database error: No alias or index file found for nucleotide database [/home/chauvec/projects/ctb-chauvec/PLASMIDS/ARETE_MAY22/exp/test_run/e_coli_055/tmp/fixed.input.fasta] in search path [::]\n' [in /project/6069942/PLASMIDS/ARETE_MAY22/hyasp_env/lib/python3.6/site-packages/mob_suite-3.0.3-py3.6.egg/mob_suite/blast/init.py:162] 2022-05-06 10:30:01,538 root INFO: Filtering replicon blast results /home/chauvec/projects/ctb-chauvec/PLASMIDS/ARETE_MAY22/exp/test_run/e_coli_055/tmp/replicon_blast_results.txt [in /project/6069942/PLASMIDS/ARETE_MAY22/hyasp_env/lib/python3.6/site-packages/mob_suite-3.0.3-py3.6.egg/mob_suite/utils.py:1146] 2022-05-06 10:30:01,538 root WARNING: No BLASTN results to parse from file /home/chauvec/projects/ctb-chauvec/PLASMIDS/ARETE_MAY22/exp/test_run/e_coli_055/tmp/replicon_blast_results.txt [in /project/6069942/PLASMIDS/ARETE_MAY22/hyasp_env/lib/python3.6/site-packages/mob_suite-3.0.3-py3.6.egg/mob_suite/blast/init.py:183] 2022-05-06 10:30:01,555 root INFO: Blasting relaxase sequences /project/6069942/PLASMIDS/ARETE_MAY22/hyasp_env/lib/python3.6/site-packages/mob_suite-3.0.3-py3.6.egg/mob_suite/databases/mob.proteins.faa against /home/chauvec/projects/ctb-chauvec/PLASMIDS/ARETE_MAY22/exp/test_run/e_coli_055/tmp/fixed.input.fasta [in /project/6069942/PLASMIDS/ARETE_MAY22/hyasp_env/lib/python3.6/site-packages/mob_suite-3.0.3-py3.6.egg/mob_suite/utils.py:1164] 2022-05-06 10:30:01,962 root INFO: Filtering relaxase blast results /home/chauvec/projects/ctb-chauvec/PLASMIDS/ARETE_MAY22/exp/test_run/e_coli_055/tmp/mob_blast_results.txt [in /project/6069942/PLASMIDS/ARETE_MAY22/hyasp_env/lib/python3.6/site-packages/mob_suite-3.0.3-py3.6.egg/mob_suite/utils.py:1169] 2022-05-06 10:30:01,963 root WARNING: No BLASTN results to parse from file /home/chauvec/projects/ctb-chauvec/PLASMIDS/ARETE_MAY22/exp/test_run/e_coli_055/tmp/mob_blast_results.txt [in /project/6069942/PLASMIDS/ARETE_MAY22/hyasp_env/lib/python3.6/site-packages/mob_suite-3.0.3-py3.6.egg/mob_suite/blast/init.py:183] 2022-05-06 10:30:01,965 root INFO: Blasting MPF sequences /project/6069942/PLASMIDS/ARETE_MAY22/hyasp_env/lib/python3.6/site-packages/mob_suite-3.0.3-py3.6.egg/mob_suite/databases/mpf.proteins.faa against /home/chauvec/projects/ctb-chauvec/PLASMIDS/ARETE_MAY22/exp/test_run/e_coli_055/tmp/fixed.input.fasta [in /project/6069942/PLASMIDS/ARETE_MAY22/hyasp_env/lib/python3.6/site-packages/mob_suite-3.0.3-py3.6.egg/mob_suite/utils.py:1188] 2022-05-06 10:30:02,056 root WARNING: No BLASTN results to parse from file /home/chauvec/projects/ctb-chauvec/PLASMIDS/ARETE_MAY22/exp/test_run/e_coli_055/tmp/mpf_blast_results.txt [in /project/6069942/PLASMIDS/ARETE_MAY22/hyasp_env/lib/python3.6/site-packages/mob_suite-3.0.3-py3.6.egg/mob_suite/blast/init.py:183] 2022-05-06 10:30:02,057 root INFO: Blasting orit sequences /project/6069942/PLASMIDS/ARETE_MAY22/hyasp_env/lib/python3.6/site-packages/mob_suite-3.0.3-py3.6.egg/mob_suite/databases/orit.fas against /home/chauvec/projects/ctb-chauvec/PLASMIDS/ARETE_MAY22/exp/test_run/e_coli_055/tmp/fixed.input.fasta [in /project/6069942/PLASMIDS/ARETE_MAY22/hyasp_env/lib/python3.6/site-packages/mob_suite-3.0.3-py3.6.egg/mob_suite/utils.py:1212] 2022-05-06 10:30:02,115 root ERROR: blastn on db /project/6069942/PLASMIDS/ARETE_MAY22/hyasp_env/lib/python3.6/site-packages/mob_suite-3.0.3-py3.6.egg/mob_suite/databases/orit.fas and query /home/chauvec/projects/ctb-chauvec/PLASMIDS/ARETE_MAY22/exp/test_run/e_coli_055/tmp/fixed.input.fasta STDERR: b'BLAST Database error: No alias or index file found for nucleotide database [/home/chauvec/projects/ctb-chauvec/PLASMIDS/ARETE_MAY22/exp/test_run/e_coli_055/tmp/fixed.input.fasta] in search path [::]\n' [in /project/6069942/PLASMIDS/ARETE_MAY22/hyasp_env/lib/python3.6/site-packages/mob_suite-3.0.3-py3.6.egg/mob_suite/ blast/init.py:162] 2022-05-06 10:30:02,186 root INFO: Filtering orit blast results /home/chauvec/projects/ctb-chauvec/PLASMIDS/ARETE_MAY22/exp/test_run/e_coli_055/tmp/orit_blast_results.txt [in /project/6069942/PLASMIDS/ARETE_MAY22/hyasp_env/lib/python3.6/site-packages/mob_suite-3.0.3-py3.6.egg/mob_suite/utils.py:1217] 2022-05-06 10:30:02,187 root WARNING: No BLASTN results to parse from file /home/chauvec/projects/ctb-chauvec/PLASMIDS/ARETE_MAY22/exp/test_run/e_coli_055/tmp/orit_blast_results.txt [in /project/6069942/PLASMIDS/ARETE_MAY22/hyasp_env/lib/python3.6/site-packages/mob_suite-3.0.3-py3.6.egg/mob_suite/blast/init.py:183] 2022-05-06 10:30:02,189 root INFO: Blasting contigs against repetitive sequences db: /project/6069942/PLASMIDS/ARETE_MAY22/hyasp_env/lib/python3.6/site-packages/mob_suite-3.0.3-py3.6.egg/mob_suite/databases/repetitive.dna.fas [in /project/6069942/PLASMIDS/ARETE_MAY22/hyasp_env/lib/python3.6/site-packages/mob_suite-3.0.3-py3.6.egg/mob_suite/utils.py:1234] 2022-05-06 10:30:02,262 root ERROR: blastn on db /home/chauvec/projects/ctb-chauvec/PLASMIDS/ARETE_MAY22/exp/test_run/e_coli_055/tmp/fixed.input.fasta and query /project/6069942/PLASMIDS/ARETE_MAY22/hyasp_env/lib/python3.6/site-packages/mob_suite-3.0.3-py3.6.egg/mob_suite/databases/repetitive.dna.fas STDERR:b'BLAST Database error: No alias or index file found for nucleotide database [/project/6069942/PLASMIDS/ARETE_MAY22/hyasp_env/lib/python3.6/site-packages/mob_suite-3.0.3-py3.6.egg/mob_suite/databases/repetitive.dna.fas] in search path [::]\n' [in /project/6069942/PLASMIDS/ARETE_MAY22/hyasp_env/lib/python3.6/site-packages/mob_suite-3.0.3-py3.6.egg/mob_suite/blast/init.py:162] 2022-05-06 10:30:02,266 root INFO: Filtering repetitive blast results /home/chauvec/projects/ctb-chauvec/PLASMIDS/ARETE_MAY22/exp/test_run/e_coli_055/tmp/repetitive_blast_results.txt [in /project/6069942/PLASMIDS/ARETE_MAY22/hyasp_env/lib/python3.6/site-packages/mob_suite-3.0.3-py3.6.egg/mob_suite/utils.py:1 238] 2022-05-06 10:30:02,266 root WARNING: No BLASTN results to parse from file /home/chauvec/projects/ctb-chauvec/PLASMIDS/ARETE_MAY22/exp/test_run/e_coli_055/tmp/repetitive_blast_results.txt [in /project/6069942/PLASMIDS/ARETE_MAY22/hyasp_env/lib/python3.6/site-packages/mob_suite-3.0.3-py3.6.egg/mob_suite/blast/init.py:183] 2022-05-06 10:30:02,268 root INFO: Blasting contigs against reference sequence db: /project/6069942/PLASMIDS/ARETE_MAY22/hyasp_env/lib/python3.6/site-packages/mob_suite-3.0.3-py3.6.egg/mob_suite/databases/ncbi_plasmid_full_seqs.fas [in /project/6069942/PLASMIDS/ARETE_MAY22/hyasp_env/lib/python3.6/site-packages/mob_suite-3.0.3-py3.6.egg/mob_suite/mob_recon.py:1261] 2022-05-06 10:30:02,325 root ERROR: blastn on db /home/chauvec/projects/ctb-chauvec/PLASMIDS/ARETE_MAY22/exp/test_run/e_coli_055/tmp/fixed.input.fasta and query /project/6069942/PLASMIDS/ARETE_MAY22/hyasp_env/lib/python3.6/site-packages/mob_suite-3.0.3-py3.6.egg/mob_suite/databases/ncbi_plasmid_full_seqs.fas STDERR: b'BLAST Database error: No alias or index file found for nucleotide database [/project/6069942/PLASMIDS/ARETE_MAY22/hyasp_env/lib/python3.6/site-packages/mob_suite-3.0.3-py3.6.egg/mob_suite/databases/ncbi_plasmid_full_seqs.fas] in search path [::]\n' [in /project/6069942/PLASMIDS/ARETE_MAY22/hyasp_env/lib/python3.6/site-packages/mob_suite-3.0.3-py3.6.egg/mob_suite/blast/init.py:162] 2022-05-06 10:30:02,334 root INFO: Filtering contig blast results: /home/chauvec/projects/ctb-chauvec/PLASMIDS/ARETE_MAY22/exp/test_run/e_coli_055/tmp/contig_blast_results.txt [in /project/6069942/PLASMIDS/ARETE_MAY22/hyasp_env/lib/python3.6/site-packages/mob_suite-3.0.3-py3.6.egg/mob_suite/mob_recon.py:1266] 2022-05-06 10:30:02,335 root WARNING: No BLASTN results to parse from file /home/chauvec/projects/ctb-chauvec/PLASMIDS/ARETE_MAY22/exp/test_run/e_coli_055/tmp/contig_blast_results.txt [in /project/6069942/PLASMIDS/ARETE_MAY22/hyasp_env/lib/python3.6/site-packages/mob_suite-3.0.3-py3.6.egg/mob_suite/blast/init.py:183] 2022-05-06 10:30:02,336 root INFO: Writting contig results to /home/chauvec/projects/ctb-chauvec/PLASMIDS/ARETE_MAY22/exp/test_run/e_coli_055/contig_report.txt [in /project/6069942/PLASMIDS/ARETE_MAY22/hyasp_env/lib/python3.6/site-packages/mob_suite-3.0.3-py3.6.egg/mob_suite/mob_recon.py:1353] 2022-05-06 10:30:02,374 root INFO: Writting chromosome sequences to /home/chauvec/projects/ctb-chauvec/PLASMIDS/ARETE_MAY22/exp/test_run/e_coli_055/chromosome.fasta [in /project/6069942/PLASMIDS/ARETE_MAY22/hyasp_env/lib/python3.6/site-packages/mob_suite-3.0.3-py3.6.egg/mob_suite/mob_recon.py:1368] 2022-05-06 10:30:02,386 mob_suite.mob_recon INFO: MOB-recon completed and results written to /home/chauvec/projects/ctb-chauvec/PLASMIDS/ARETE_MAY22/exp/test_run/e_coli_055 [in /project/6069942/PLASMIDS/ARETE_MAY22/hyasp_env/lib/python3.6/site-packages/mob_suite-3.0.3-py3.6.egg/mob_suite/mob_recon.py:1378]

kbessonov1984 commented 2 years ago

Hi Cedric,

Based on your logs it seems that you were able to run mob-recon successfully based on the log message INFO: MOB-recon completed and results written to /home/chauvec/projects/ctb-chauvec/PLASMIDS/ARETE_MAY22/exp/test_run/e_coli_055.

We always recommend conda install, but the following options I run successfully myself on the cedar cluster taking into account their policies.

Option 1: Install MOB-suite inside a python virtual environment

To troubleshoot by running in srun allowing for interactive environment on a node rather than sbatch script non-interactive mode.

$ srun -c 1 --mem=8G --pty bash
$ module load StdEnv/2020  gcc/9.3.0 blast+/2.12.0 mash
$ module load python/3.8.2
$ cd $HOME
$ python3 -m venv mobsuite
$ source  $HOME/mobsuite/bin/activate
$ pip3 install mob-suite==3.0.3 && mob_init
2022-05-06 19:34:24,067 mob_suite.utils INFO: MOB init completed successfully [in /home/kbessono/mobsuite/lib/python3.8/site-packages/mob_suite/mob_init.py:250]
$ mob_recon -i ${FASTA} -o ${OUT} -fkut

Interestingly I was getting Segmentation fault on mob_init step during pycurl db download when running install and mob_init in the projects directory previously. I did not see this error when running in $HOME and with the above commands.

Results are available at /project/def-kbessono/kbessono/mob-suite-venv

Option 2: Build a Singularity container from a Docker image

Unfortunately BioContainers resource is not working now, hopefully they will make this service available soon. As alternative here I will show how to run MOB-suite inside a Singularity container on cedar. I had tested this procedure on cedar myself and it worked. For more info refer to this documentation

Q: With regards to "Is it necessary to make database of rep.dna.fas AND orit.fas? A: No, mob_init will perform all necessary steps and create all necessary databases. The issue that you were facing seem to be related to incomplete database. I discovered that pycurl python library is not working properly giving a Segmentation fault errors probably linked to ssl certificates.

Here are the list of commands that you need to run. The created image that will be pulled from a DockerHub will have all databases initialized and all should work flawlessly.

Notes

#request a node in HPC cluster for 4 hours and 1 core, for example. Need a large amount of memory to build this image. It took around 1 hour to build it on a cluster
$ srun -c 1 --time 4:00:00 --mem=16G --pty bash
$ singularity build --remote mobsuite.sif docker://kbessonov/mob_suite:3.0.3
INFO:    Starting build...
Getting image source signatures
Writing manifest to image destination
Storing signatures
INFO:    Creating SIF file...
INFO:    Build complete: mobsuite.sif
$ ls -allh
drwx--S--- 3 kbessono def-kbessono 4.0K May  6 13:54 .
drwxrws--- 3 kbessono def-kbessono 4.0K Jun 10  2021 ..
drwxr-s--- 7 kbessono def-kbessono 4.0K May  6 12:38 mobsuite
-rwxr-x--- 1 kbessono def-kbessono 1.5G May  6 13:54 mobsuite.sif
# Option 1:  Run non-interactively the MOB-suite tools inside a container and exit. Bind/mount a host directory inside a container via the -B parameter. For example, mount a current directory as /mnt inside a container
$ singularity run -B .:/mnt mobsuite.sif bash -c "cd  /mnt && mob_recon -i assembly.fasta -o mob-suite -fkut"
# Option 2: Alternatively login into a container and run commands interactively. This allows one to explore the file structure and container behaviour
$ singularity shell -B .:/mnt mobsuite.sif
Singularity> mob_recon --version
mob_recon 3.0.3

Results are available at /project/def-kbessono/kbessono/mob-suite-singularity identical to option 1 as expected.

Option 3: Use a Galaxy web-interface

Simply run MOB-recon on a public Galaxy instance at https://usegalaxy.eu/root?tool_id=mob_recon

cchauve commented 2 years ago

Hello Kirill.

The virtual environment installation went with no problem, but I still have the same problem (see attached log).

Regards

cedric chauve


From: Kirill Bessonov @.***> Sent: May 6, 2022 20:01 To: phac-nml/mob-suite Cc: Computational Methods for Comparative Genomics at Simon Fraser University; Author Subject: Re: [phac-nml/mob-suite] Missing Blast database (Issue #107)

Hi Cedric,

Based on your logs it seems that you were able to run mob-recon successfully based on the log message INFO: MOB-recon completed and results written to /home/chauvec/projects/ctb-chauvec/PLASMIDS/ARETE_MAY22/exp/test_run/e_coli_055.

We always recommend always conda install, but the following options I run successfully myself on a cluster taking into account their polices.

Option 1: Install MOB-suite inside a python virtual environment

To trouble shoot run in srun mode rather than sbatch

$ srun -c 1 --mem=8G --pty bash $ module load StdEnv/2020 gcc/9.3.0 blast+/2.12.0 mash $ module load python/3.8.2 $ cd $HOME $ python3 -m venv mobsuite $ source $HOME/mobsuite/bin/activate $ pip3 install mob-suite==3.0.3 && mob_init 2022-05-06 19:34:24,067 mob_suite.utils INFO: MOB init completed successfully [in /home/kbessono/mobsuite/lib/python3.8/site-packages/mob_suite/mob_init.py:250] $ mob_recon -i ${FASTA} -o ${OUT} -fkut

Interesting I was getting Segmentation fault on mob_init step during pycurl db download when running install in the projects directory previously. I did not see this error when running in $HOME and with the above commands.

Results are available at /project/def-kbessono/kbessono/mob-suite-venv

Option 2: Build a Singularity container from a Docker image

Unfortunately BioContainers resource is not working now. As alternative here I will show you how you can run MOB-Suite inside a container on cedar. I had tested this procedure on cedar myself and it worked. For more info refer to this documentationhttps://docs.computecanada.ca/wiki/Singularity#Creating_an_image_using_Docker_Hub_and_Dockerfile

Q: With regards to "Is it necessary to make database of rep.dna.fas AND orit.fas? A: No, mob_init will perform all necessary steps and create all necessary databases. The issue that you were facing seem to be related to incomplete database. I discovered that pycurl python library is not working properly giving a Segmentation fault errors probably linked to ssl certificates.

Here are the list of commands that you need to run. The created image that will be pulled from a DockerHub will have all databases initialized and all should work flawlessly.

Notes

request a node in HPC cluster for 4 hours and 1 core, for example. Need a large amount of memory to build this image. It took around 1 hour to build it on a cluster

srun -c 1 --time 4:00:00 --mem=16G --pty bash $ singularity build --remote mobsuite.sif docker://kbessonov/mob_suite:3.0.3 INFO: Starting build... Getting image source signatures Writing manifest to image destination Storing signatures INFO: Creating SIF file... INFO: Build complete: mobsuite.sif $ ls -allh drwx--S--- 3 kbessono def-kbessono 4.0K May 6 13:54 . drwxrws--- 3 kbessono def-kbessono 4.0K Jun 10 2021 .. drwxr-s--- 7 kbessono def-kbessono 4.0K May 6 12:38 mobsuite -rwxr-x--- 1 kbessono def-kbessono 1.5G May 6 13:54 mobsuite.sif

Option 1: Bind/mount a host directory inside a container and run analyses. For example, mount current directory as /mnt inside a container

$ singularity run -B .:/mnt mobsuite.sif bash -c "cd /mnt && mob_recon -i assembly.fasta -o mob-suite -fkut"

Option 2: Alternatively login to a container and run commands interactively and then exist container once done. The files

$ singularity shell -B .:/mnt mobsuite.sif Singularity> mob_recon --version mob_recon 3.0.3

Results are available at /project/def-kbessono/kbessono/mob-suite-singularity identical to option 1 as expected.

Option 3: Use a Galaxy web-interface

Simply run MOB-recon on a public Galaxy instance at https://usegalaxy.eu/root?tool_id=mob_recon

— Reply to this email directly, view it on GitHubhttps://github.com/phac-nml/mob-suite/issues/107#issuecomment-1120116079, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACXV7DTETX5CS26GD55XBJ3VIXJ4FANCNFSM5VGBQKGQ. You are receiving this because you authored the thread.Message ID: @.***>

@. ARETE_MAY22 > module load python/3.8.2 @. ARETE_MAY22 > python3 -m venv mob_env @. ARETE_MAY22 > source mob_env/bin/activate (mob_env) @. ARETE_MAY22 > pip3 install mob-suite==3.0.3 && mob_init Ignoring pip: markers 'python_version < "3"' don't match your environment Looking in links: /cvmfs/soft.computecanada.ca/custom/python/wheelhouse/gentoo/avx2, /cvmfs/soft.computecanada.ca/custom/python/wheelhouse/gentoo/generic, /cvmfs/soft.computecanada.ca/custom/python/wheelhouse/generic Collecting mob-suite==3.0.3 Downloading https://files.pythonhosted.org/packages/4c/08/c509a168fcadb04f5124a50827ec58a8c598b2c24032fc38cf488e0488c1/mob_suite-3.0.3.tar.gz (203kB) || 204kB 22.6MB/s Collecting numpy<2,>=1.11.1 (from mob-suite==3.0.3) Collecting tables<4,>=3.3.0 (from mob-suite==3.0.3) Collecting pandas<=1.0.5,>=0.22.0 (from mob-suite==3.0.3) Collecting biopython<2,>=1.70 (from mob-suite==3.0.3) Collecting pycurl<8,>=7.43.0 (from mob-suite==3.0.3) Using cached https://files.pythonhosted.org/packages/09/ca/0b6da1d0f391acb8991ac6fdf8823ed9cf4c19680d4f378ab1727f90bd5c/pycurl-7.45.1.tar.gz Collecting scipy<2,>=1.1.0 (from mob-suite==3.0.3) Collecting ete3<4,>=3.0 (from mob-suite==3.0.3) Collecting six<2,>=1.10 (from mob-suite==3.0.3) Collecting pyqt5<6,>=5.0 (from mob-suite==3.0.3) Collecting packaging (from tables<4,>=3.3.0->mob-suite==3.0.3) Collecting numexpr>=2.6.2 (from tables<4,>=3.3.0->mob-suite==3.0.3) Collecting pytz>=2017.2 (from pandas<=1.0.5,>=0.22.0->mob-suite==3.0.3) Collecting python-dateutil>=2.6.1 (from pandas<=1.0.5,>=0.22.0->mob-suite==3.0.3) Collecting PyQt5-sip<13,>=12.8 (from pyqt5<6,>=5.0->mob-suite==3.0.3) Collecting pyparsing!=3.0.5,>=2.0.2 (from packaging->tables<4,>=3.3.0->mob-suite==3.0.3) Installing collected packages: numpy, pyparsing, packaging, numexpr, tables, pytz, six, python-dateutil, pandas, biopython, pycurl, scipy, ete3, PyQt5-sip, pyqt5, mob-suite Running setup.py install for pycurl ... done Running setup.py install for mob-suite ... done Successfully installed PyQt5-sip-12.8.1+computecanada biopython-1.79+computecanada ete3-3.1.2+computecanada mob-suite-3.0.3 numexpr-2.8.1+computecanada numpy-1.22.2+computecanada packaging-21.3+computecanada pandas-1.0.3+computecanada pycurl-7.45.1 pyparsing-3.0.8+computecanada pyqt5-5.15.1+computecanada python-dateutil-2.8.2+computecanada pytz-2022.1+computecanada scipy-1.8.0+computecanada six-1.16.0+computecanada tables-3.7.0+computecanada 2022-05-09 09:24:22,906 mob_suite.utils INFO: Database directory folder already exists at /project/6069942/PLASMIDS/ARETE_MAY22/mob_env/lib/python3.8/site-packages/mob_suite/databases [in /project/6069942/PLASMIDS/ARETE_MAY22/mob_env/lib/python3.8/site-packages/mob_suite/mob_init.py:131] 2022-05-09 09:24:22,908 mob_suite.utils INFO: Placed lock file at /project/6069942/PLASMIDS/ARETE_MAY22/mob_env/lib/python3.8/site-packages/mob_suite/databases/.lock [in /project/6069942/PLASMIDS/ARETE_MAY22/mob_env/lib/python3.8/site-packages/mob_suite/mob_init.py:142] 2022-05-09 09:24:22,908 mob_suite.utils INFO: Initializing databases...this will take some time [in /project/6069942/PLASMIDS/ARETE_MAY22/mob_env/lib/python3.8/site-packages/mob_suite/mob_init.py:165] 2022-05-09 09:24:22,908 mob_suite.utils INFO: Downloading databases...this will take some time [in /project/6069942/PLASMIDS/ARETE_MAY22/mob_env/lib/python3.8/site-packages/mob_suite/mob_init.py:178] 2022-05-09 09:24:22,908 mob_suite.utils INFO: Trying mirror https://zenodo.org/record/3786915/files/data.tar.gz?download=1 [in /project/6069942/PLASMIDS/ARETE_MAY22/mob_env/lib/python3.8/site-packages/mob_suite/mob_init.py:182] % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 450M 100 450M 0 0 3253k 0 0:02:21 0:02:21 --:--:-- 2462k 2022-05-09 09:26:44,743 mob_suite.utils INFO: Downloading databases successful, now building databases at /project/6069942/PLASMIDS/ARETE_MAY22/mob_env/lib/python3.8/site-packages/mob_suite/databases [in /project/6069942/PLASMIDS/ARETE_MAY22/mob_env/lib/python3.8/site-packages/mob_suite/mob_init.py:191] 2022-05-09 09:26:44,747 mob_suite.utils INFO: Decompressing /project/6069942/PLASMIDS/ARETE_MAY22/mob_env/lib/python3.8/site-packages/mob_suite/databases/data.tar.gz [in /project/6069942/PLASMIDS/ARETE_MAY22/mob_env/lib/python3.8/site-packages/mob_suite/mob_init.py:107] 2022-05-09 09:27:05,161 mob_suite.utils INFO: Building repetitive mask database [in /project/6069942/PLASMIDS/ARETE_MAY22/mob_env/lib/python3.8/site-packages/mob_suite/mob_init.py:204] 2022-05-09 09:27:05,945 mob_suite.utils INFO: Building complete plasmid database [in /project/6069942/PLASMIDS/ARETE_MAY22/mob_env/lib/python3.8/site-packages/mob_suite/mob_init.py:208] 2022-05-09 09:27:19,577 mob_suite.utils INFO: Sketching complete plasmid database [in /project/6069942/PLASMIDS/ARETE_MAY22/mob_env/lib/python3.8/site-packages/mob_suite/mob_init.py:212] 2022-05-09 09:27:31,867 mob_suite.utils INFO: Init ete3 library ... [in /project/6069942/PLASMIDS/ARETE_MAY22/mob_env/lib/python3.8/site-packages/mob_suite/mob_init.py:224] Downloading taxdump.tar.gz from NCBI FTP site (via HTTP)... Done. Parsing... Loading node names... 2418964 names loaded. 275547 synonyms loaded. Loading nodes... 2418964 nodes loaded. Linking nodes... Tree is loaded. Updating database: /project/6069942/PLASMIDS/ARETE_MAY22/mob_env/lib/python3.8/site-packages/mob_suite/databases/taxa.sqlite ... 2418000 generating entries... Uploading to /project/6069942/PLASMIDS/ARETE_MAY22/mob_env/lib/python3.8/site-packages/mob_suite/databases/taxa.sqlite

Inserting synonyms: 275000 Inserting taxid merges: 65000 Inserting taxids: 2415000 2022-05-09 09:38:14,236 mob_suite.utils INFO: Removed residual taxdump.tar.gz as ete3 is not doing proper cleaning job. [in /project/6069942/PLASMIDS/ARETE_MAY22/mob_env/lib/python3.8/site-packages/mob_suite/mob_init.py:236] 2022-05-09 09:38:14,239 mob_suite.utils INFO: MOB init completed successfully [in /project/6069942/PLASMIDS/ARETE_MAY22/mob_env/lib/python3.8/site-packages/mob_suite/mob_init.py:250] @.*** test_run > cat run.sh

!/bin/bash

SBATCH --time=8:00:00

SBATCH --mem=8G

SBATCH --account=def-chauvec

source /home/chauvec/projects/ctb-chauvec/PLASMIDS/ARETE_MAY22/mob_env/bin/activate

Sample contigs sequences

FASTA=/home/chauvec/projects/ctb-chauvec/jsiele/results/e_coli/short_read_assembly/e_coli_055_assembly/assembly.fasta

Output directory

OUT=/home/chauvec/projects/ctb-chauvec/PLASMIDS/ARETE_MAY22/exp/test_run/e_coli_055

mkdir -p ${OUT} cd ${OUT}

mob_recon -i ${FASTA} -o ${OUT} -fkut @. test_run > sbatch run.sh Submitted batch job 33132702 @. test_run > cat slurm-33132702.out 2022-05-09 09:47:40,782 mob_suite.mob_recon INFO: MOB-recon version 3.0.3 [in /project/6069942/PLASMIDS/ARETE_MAY22/mob_env/lib/python3.8/site-packages/mob_suite/mob_recon.p y:983] 2022-05-09 09:47:40,784 mob_suite.mob_recon INFO: SUCCESS: Found program blastn at /cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Compiler/gcc9/blast+/2.12.0/bin/b lastn [in /project/6069942/PLASMIDS/ARETE_MAY22/mob_env/lib/python3.8/site-packages/mob_suite/utils.py:590] 2022-05-09 09:47:40,785 mob_suite.mob_recon INFO: SUCCESS: Found program makeblastdb at /cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Compiler/gcc9/blast+/2.12.0/ bin/makeblastdb [in /project/6069942/PLASMIDS/ARETE_MAY22/mob_env/lib/python3.8/site-packages/mob_suite/utils.py:590] 2022-05-09 09:47:40,787 mob_suite.mob_recon INFO: SUCCESS: Found program tblastn at /cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Compiler/gcc9/blast+/2.12.0/bin/ tblastn [in /project/6069942/PLASMIDS/ARETE_MAY22/mob_env/lib/python3.8/site-packages/mob_suite/utils.py:590] 2022-05-09 09:47:40,820 mob_suite.mob_recon INFO: Processing fasta file /home/chauvec/projects/ctb-chauvec/jsiele/results/e_coli/short_read_assembly/e_coli_055_assembly/assem bly.fasta [in /project/6069942/PLASMIDS/ARETE_MAY22/mob_env/lib/python3.8/site-packages/mob_suite/mob_recon.py:1004] 2022-05-09 09:47:40,820 mob_suite.mob_recon INFO: Analysis directory /home/chauvec/projects/ctb-chauvec/PLASMIDS/ARETE_MAY22/exp/test_run/e_coli_055 [in /project/6069942/PLAS MIDS/ARETE_MAY22/mob_env/lib/python3.8/site-packages/mob_suite/mob_recon.py:1005] 2022-05-09 09:47:40,896 root INFO: Creating Lock file /project/6069942/PLASMIDS/ARETE_MAY22/mob_env/lib/python3.8/site-packages/mob_suite/databases/ETE3_DB.lock [in /project/ 6069942/PLASMIDS/ARETE_MAY22/mob_env/lib/python3.8/site-packages/mob_suite/utils.py:450] 2022-05-09 09:47:40,898 root INFO: Testing ETE3 taxonomy db /project/6069942/PLASMIDS/ARETE_MAY22/mob_env/lib/python3.8/site-packages/mob_suite/databases/taxa.sqlite [in /pro ject/6069942/PLASMIDS/ARETE_MAY22/mob_env/lib/python3.8/site-packages/mob_suite/utils.py:459] 2022-05-09 09:47:41,124 root INFO: Lock file removed. [in /project/6069942/PLASMIDS/ARETE_MAY22/mob_env/lib/python3.8/site-packages/mob_suite/utils.py:466] 2022-05-09 09:47:56,113 mob_suite.mob_recon INFO: Writing cleaned header input fasta file from /home/chauvec/projects/ctb-chauvec/jsiele/results/e_coli/short_readassembly/e coli_055_assembly/assembly.fasta to /home/chauvec/projects/ctb-chauvec/PLASMIDS/ARETE_MAY22/exp/test_run/e_coli_055/tmp/fixed.input.fasta [in /project/6069942/PLASMIDS/ARET E_MAY22/mob_env/lib/python3.8/site-packages/mob_suite/mob_recon.py:1105] 2022-05-09 09:48:05,384 root INFO: Blasting replicon sequences /project/6069942/PLASMIDS/ARETE_MAY22/mob_env/lib/python3.8/site-packages/mob_suite/databases/rep.dna.fas again st /home/chauvec/projects/ctb-chauvec/PLASMIDS/ARETE_MAY22/exp/test_run/e_coli_055/tmp/fixed.input.fasta [in /project/6069942/PLASMIDS/ARETE_MAY22/mob_env/lib/python3.8/sit e-packages/mob_suite/utils.py:1141] 2022-05-09 09:48:05,797 root ERROR: blastn on db /project/6069942/PLASMIDS/ARETE_MAY22/mob_env/lib/python3.8/site-packages/mob_suite/databases/rep.dna.fas and query /home/cha uvec/projects/ctb-chauvec/PLASMIDS/ARETE_MAY22/exp/test_run/e_coli_055/tmp/fixed.input.fasta STDERR: b'BLAST Database error: No alias or index file found for nucleotide dat abase [/home/chauvec/projects/ctb-chauvec/PLASMIDS/ARETE_MAY22/exp/test_run/e_coli_055/__tmp/fixed.input.fasta] in search path [::]\n' [in /project/6069942/PLASMIDS/ARETE_MAY 22/mob_env/lib/python3.8/site-packages/mob_suite/blast/init.py:162] 2022-05-09 09:48:05,801 root INFO: Filtering replicon blast results /home/chauvec/projects/ctb-chauvec/PLASMIDS/ARETE_MAY22/exp/test_run/e_coli_055/tmp/replicon_blast_resul ts.txt [in /project/6069942/PLASMIDS/ARETE_MAY22/mob_env/lib/python3.8/site-packages/mob_suite/utils.py:1146] 2022-05-09 09:48:05,801 root WARNING: No BLASTN results to parse from file /home/chauvec/projects/ctb-chauvec/PLASMIDS/ARETE_MAY22/exp/test_run/e_coli_055/tmp/replicon_blas t_results.txt [in /project/6069942/PLASMIDS/ARETE_MAY22/mob_env/lib/python3.8/site-packages/mob_suite/blast/init.py:183] 2022-05-09 09:48:05,817 root INFO: Blasting relaxase sequences /project/6069942/PLASMIDS/ARETE_MAY22/mob_env/lib/python3.8/site-packages/mob_suite/databases/mob.proteins.faa against /home/chauvec/projects/ctb-chauvec/PLASMIDS/ARETE_MAY22/exp/test_run/e_coli_055/tmp/fixed.input.fasta [in /project/6069942/PLASMIDS/ARETE_MAY22/mob_env/lib/python3. 8/site-packages/mob_suite/utils.py:1164] 2022-05-09 09:48:05,892 root INFO: Filtering relaxase blast results /home/chauvec/projects/ctb-chauvec/PLASMIDS/ARETE_MAY22/exp/test_run/e_coli_055/tmp/mob_blast_results.tx t [in /project/6069942/PLASMIDS/ARETE_MAY22/mob_env/lib/python3.8/site-packages/mob_suite/utils.py:1169] 2022-05-09 09:48:05,892 root WARNING: No BLASTN results to parse from file /home/chauvec/projects/ctb-chauvec/PLASMIDS/ARETE_MAY22/exp/test_run/e_coli_055/__tmp/mob_blast_res ults.txt [in /project/6069942/PLASMIDS/ARETE_MAY22/mob_env/lib/python3.8/site-packages/mob_suite/blast/init.py:183] 2022-05-09 09:48:05,894 root INFO: Blasting MPF sequences /project/6069942/PLASMIDS/ARETE_MAY22/mob_env/lib/python3.8/site-packages/mob_suite/databases/mpf.proteins.faa again st /home/chauvec/projects/ctb-chauvec/PLASMIDS/ARETE_MAY22/exp/test_run/e_coli_055/__tmp/fixed.input.fasta [in /project/6069942/PLASMIDS/ARETE_MAY22/mob_env/lib/python3.8/sit e-packages/mob_suite/utils.py:1188] 2022-05-09 09:48:05,960 root WARNING: No BLASTN results to parse from file /home/chauvec/projects/ctb-chauvec/PLASMIDS/ARETE_MAY22/exp/test_run/e_coli_055/tmp/mpf_blast_res ults.txt [in /project/6069942/PLASMIDS/ARETE_MAY22/mob_env/lib/python3.8/site-packages/mob_suite/blast/init.py:183] 2022-05-09 09:48:05,962 root INFO: Blasting orit sequences /project/6069942/PLASMIDS/ARETE_MAY22/mob_env/lib/python3.8/site-packages/mob_suite/databases/orit.fas against /hom e/chauvec/projects/ctb-chauvec/PLASMIDS/ARETE_MAY22/exp/test_run/e_coli_055/tmp/fixed.input.fasta [in /project/6069942/PLASMIDS/ARETE_MAY22/mob_env/lib/python3.8/site-packa ges/mob_suite/utils.py:1212] 2022-05-09 09:48:06,023 root ERROR: blastn on db /project/6069942/PLASMIDS/ARETE_MAY22/mob_env/lib/python3.8/site-packages/mob_suite/databases/orit.fas and query /home/chauve c/projects/ctb-chauvec/PLASMIDS/ARETE_MAY22/exp/test_run/e_coli_055/tmp/fixed.input.fasta STDERR: b'BLAST Database error: No alias or index file found for nucleotide databa se [/home/chauvec/projects/ctb-chauvec/PLASMIDS/ARETE_MAY22/exp/test_run/e_coli_055/tmp/fixed.input.fasta] in search path [::]\n' [in /project/6069942/PLASMIDS/ARETE_MAY22/ mob_env/lib/python3.8/site-packages/mob_suite/blast/init.py:162] 2022-05-09 09:48:06,028 root INFO: Filtering orit blast results /home/chauvec/projects/ctb-chauvec/PLASMIDS/ARETE_MAY22/exp/test_run/e_coli_055/__tmp/orit_blast_results.txt [in /project/6069942/PLASMIDS/ARETE_MAY22/mob_env/lib/python3.8/site-packages/mob_suite/utils.py:1217] 2022-05-09 09:48:06,029 root WARNING: No BLASTN results to parse from file /home/chauvec/projects/ctb-chauvec/PLASMIDS/ARETE_MAY22/exp/test_run/e_coli_055/tmp/orit_blast_re sults.txt [in /project/6069942/PLASMIDS/ARETE_MAY22/mob_env/lib/python3.8/site-packages/mob_suite/blast/init.py:183] 2022-05-09 09:48:06,030 root INFO: Blasting contigs against repetitive sequences db: /project/6069942/PLASMIDS/ARETE_MAY22/mob_env/lib/python3.8/site-packages/mob_suite/datab ases/repetitive.dna.fas [in /project/6069942/PLASMIDS/ARETE_MAY22/mob_env/lib/python3.8/site-packages/mob_suite/utils.py:1234] 2022-05-09 09:48:06,091 root ERROR: blastn on db /home/chauvec/projects/ctb-chauvec/PLASMIDS/ARETE_MAY22/exp/test_run/e_coli_055/tmp/fixed.input.fasta and query /project/60 69942/PLASMIDS/ARETE_MAY22/mob_env/lib/python3.8/site-packages/mob_suite/databases/repetitive.dna.fas STDERR: b'BLAST Database error: No alias or index file found for nucleot ide database [/project/6069942/PLASMIDS/ARETE_MAY22/mob_env/lib/python3.8/site-packages/mob_suite/databases/repetitive.dna.fas] in search path [::]\n' [in /project/6069942/PL ASMIDS/ARETE_MAY22/mob_env/lib/python3.8/site-packages/mob_suite/blast/init.py:162] 2022-05-09 09:48:06,094 root INFO: Filtering repetitive blast results /home/chauvec/projects/ctb-chauvec/PLASMIDS/ARETE_MAY22/exp/test_run/e_coli_055/__tmp/repetitive_blast_r esults.txt [in /project/6069942/PLASMIDS/ARETE_MAY22/mob_env/lib/python3.8/site-packages/mob_suite/utils.py:1238] 2022-05-09 09:48:06,094 root WARNING: No BLASTN results to parse from file /home/chauvec/projects/ctb-chauvec/PLASMIDS/ARETE_MAY22/exp/test_run/e_coli_055/tmp/repetitive_bl ast_results.txt [in /project/6069942/PLASMIDS/ARETE_MAY22/mob_env/lib/python3.8/site-packages/mob_suite/blast/init.py:183] 2022-05-09 09:48:06,095 root INFO: Blasting contigs against reference sequence db: /project/6069942/PLASMIDS/ARETE_MAY22/mob_env/lib/python3.8/site-packages/mob_suite/databas es/ncbi_plasmid_full_seqs.fas [in /project/6069942/PLASMIDS/ARETE_MAY22/mob_env/lib/python3.8/site-packages/mob_suite/mob_recon.py:1261] 2022-05-09 09:48:06,156 root ERROR: blastn on db /home/chauvec/projects/ctb-chauvec/PLASMIDS/ARETE_MAY22/exp/test_run/e_coli_055/tmp/fixed.input.fasta and query /project/60 69942/PLASMIDS/ARETE_MAY22/mob_env/lib/python3.8/site-packages/mob_suite/databases/ncbi_plasmid_full_seqs.fas STDERR: b'BLAST Database error: No alias or index file found for nucleotide database [/project/6069942/PLASMIDS/ARETE_MAY22/mob_env/lib/python3.8/site-packages/mob_suite/databases/ncbi_plasmid_full_seqs.fas] in search path [::]\n' [in /pr oject/6069942/PLASMIDS/ARETE_MAY22/mob_env/lib/python3.8/site-packages/mob_suite/blast/init.py:162] 2022-05-09 09:48:06,158 root INFO: Filtering contig blast results: /home/chauvec/projects/ctb-chauvec/PLASMIDS/ARETE_MAY22/exp/test_run/e_coli_055/tmp/contig_blast_results. txt [in /project/6069942/PLASMIDS/ARETE_MAY22/mob_env/lib/python3.8/site-packages/mob_suite/mob_recon.py:1266] 2022-05-09 09:48:06,159 root WARNING: No BLASTN results to parse from file /home/chauvec/projects/ctb-chauvec/PLASMIDS/ARETE_MAY22/exp/test_run/e_coli_055/tmp/contigblast results.txt [in /project/6069942/PLASMIDS/ARETE_MAY22/mob_env/lib/python3.8/site-packages/mob_suite/blast/init.py:183] 2022-05-09 09:48:06,160 root INFO: Writting contig results to /home/chauvec/projects/ctb-chauvec/PLASMIDS/ARETE_MAY22/exp/test_run/e_coli_055/contig_report.txt [in /project/6 069942/PLASMIDS/ARETE_MAY22/mob_env/lib/python3.8/site-packages/mob_suite/mob_recon.py:1353] 2022-05-09 09:48:06,165 root INFO: Writting chromosome sequences to /home/chauvec/projects/ctb-chauvec/PLASMIDS/ARETE_MAY22/exp/test_run/e_coli_055/chromosome.fasta [in /proj ect/6069942/PLASMIDS/ARETE_MAY22/mob_env/lib/python3.8/site-packages/mob_suite/mob_recon.py:1368] 2022-05-09 09:48:06,181 mob_suite.mob_recon INFO: MOB-recon completed and results written to /home/chauvec/projects/ctb-chauvec/PLASMIDS/ARETE_MAY22/exp/test_run/e_coli_055 [ in /project/6069942/PLASMIDS/ARETE_MAY22/mob_env/lib/python3.8/site-packages/mob_suite/mob_recon.py:1378] @.*** test_run > ll e_coli_055/tmp/ total 3386 -rw-r--r-- 1 chauvec ctb-chauvec 4932915 May 9 09:47 fixed.input.fasta -rw-r--r-- 1 chauvec ctb-chauvec 20480 May 9 09:48 fixed.input.fasta.ndb -rw-r--r-- 1 chauvec ctb-chauvec 9345 May 9 09:47 fixed.input.fasta.nhr -rw-r--r-- 1 chauvec ctb-chauvec 1456 May 9 09:47 fixed.input.fasta.nin -rw-r--r-- 1 chauvec ctb-chauvec 1268 May 9 09:48 fixed.input.fasta.not -rw-r--r-- 1 chauvec ctb-chauvec 1232538 May 9 09:47 fixed.input.fasta.nsq -rw-r--r-- 1 chauvec ctb-chauvec 16384 May 9 09:48 fixed.input.fasta.ntf -rw-r--r-- 1 chauvec ctb-chauvec 424 May 9 09:48 fixed.input.fasta.nto

kbessonov1984 commented 2 years ago

Hi Cedric, Sorry it is not working for you. Unfortunately I was not able to find attached log in your previous post or replicate your error on cedar using the assembly you had provided previously. I am not sure what might be an issue.

Try the Singularity option either by building your own image or copy mine from /project/def-kbessono/kbessono/mobsuite.sif. The Singularity images are reproducible and well supported by the cedar sharcnet server.

As the last resource try the Galaxy web-portal or send us your genomes to analyze (e.g., a GoogleDrive or DropBox link would work great).

kbessonov1984 commented 2 years ago

I hope the Singularity container solution worked on the cedar server. If you need more help on this issue, please just re-open it.