tseemann / mlst

:id: Scan contig files against PubMLST typing schemes
GNU General Public License v2.0
192 stars 45 forks source link

Odd delays when running MLST on a slurm compute node. #115

Closed HBrendy closed 2 years ago

HBrendy commented 2 years ago

Dear Torsten and all other readers,

thank you for that great piece of software. I made a cryptic execution time observation when running conda/mlst on a SLURM compute node. On the compute node, which is disconnected from the internet, it took 180 sec compared to 20 sec on the login node (with connectivity) within the same conda env (and 3 secs on my laptop).

Of note: initially, i observed this behaviour when using mlst within our complex, multi-package conda env of our bacterial assembly and qc workflow AQUAMIS, but i have isolated this discrepancy to a simple conda env created with mamba create -n mlst mlst. Other packages in our workflow are not affected in their execution time in the compute node setting, e.g. your Shovill or QUAST, kraken2, fastP, etc. Cutting all connections on my machine while running mlst does not cause any delays.

I know, computing time is not a big thing, but for our current study, this simple delay sums up to days. Thanks for investing a thought!

Cheers, Holger

Here's the log, two odd delays before "Found blastn: 2.12.0" and before the first "Found exact allele" :

(mlst) [brendeba@c5-59 test_data_saga_slurm_20220128-213350]$ mlst --json Assembly/SRR2985019/mlst/z_report.json --label SRR2985019  Assembly/SRR2985019/contigs.fa > Assembly/SRR2985019/mlst/z_report.tsv.tmp
[12:20:25] This is mlst 2.19.0 running on linux with Perl 5.026002
[12:20:25] Checking mlst dependencies:
[12:20:25] Found 'blastn' => /cluster/projects/###redacted###/.conda/envs/mlst/bin/blastn
[12:20:25] Found 'any2fasta' => /cluster/projects/###redacted###/.conda/envs/mlst/bin/any2fasta
[12:21:47] Found blastn: 2.12.0+ (002012)
[12:21:47] Excluding 2 schemes: ecoli_2 abaumannii
[12:21:47] Using label 'SRR2985019' for file Assembly/SRR2985019/contigs.fa
[12:23:10] Found exact allele match campylobacter.glnA-1
[12:23:10] Found exact allele match clari.Cla_adk-38
[12:23:10] Found exact allele match clari.Cla_pgi-44
[12:23:10] Found exact allele match campylobacter.tkt-1
[12:23:10] Found exact allele match campylobacter.gltA-1
[12:23:10] Found exact allele match campylobacter.glyA-3
[12:23:10] Found exact allele match campylobacter.pgm-2
[12:23:10] Found exact allele match campylobacter.uncA-6
[12:23:10] Found exact allele match campylobacter.aspA-2
[12:23:10] Writing JSON: Assembly/SRR2985019/mlst/z_report.json
[12:23:10] You can follow me on Twitter at @torstenseemann
[12:23:10] Done.
HBrendy commented 2 years ago

Just tested export BLAST_USAGE_REPORT=false from issue #110 . This fixed the delay:

(mlst) [user@c5-59 test_data_saga_slurm_20220128-213350]$ mlst --json Assembly/SRR2985019/mlst/n_report.json --label SRR2985019  Assembly/SRR2985019/contigs.fa > Assembly/SRR2985019/mlst/n_report.tsv.tmp
[13:46:32] This is mlst 2.19.0 running on linux with Perl 5.026002
[13:46:32] Checking mlst dependencies:
[13:46:32] Found 'blastn' => /cluster/projects/###redacted###/.conda/envs/mlst/bin/blastn
[13:46:32] Found 'any2fasta' => /cluster/projects/###redacted###/.conda/envs/mlst/bin/any2fasta
[13:46:37] Found blastn: 2.12.0+ (002012)
[13:46:37] Excluding 2 schemes: abaumannii ecoli_2
[13:46:37] Using label 'SRR2985019' for file Assembly/SRR2985019/contigs.fa
[13:46:41] Found exact allele match campylobacter.glnA-1
[13:46:41] Found exact allele match clari.Cla_adk-38
[13:46:41] Found exact allele match clari.Cla_pgi-44
[13:46:41] Found exact allele match campylobacter.tkt-1
[13:46:41] Found exact allele match campylobacter.gltA-1
[13:46:41] Found exact allele match campylobacter.glyA-3
[13:46:41] Found exact allele match campylobacter.pgm-2
[13:46:41] Found exact allele match campylobacter.uncA-6
[13:46:41] Found exact allele match campylobacter.aspA-2
[13:46:41] Writing JSON: Assembly/SRR2985019/mlst/n_report.json
[13:46:41] Please also cite 'Jolley & Maiden 2010, BMC Bioinf, 11:595' if you use mlst.
[13:46:41] Done.