Open florianecoulmance opened 4 years ago
Hi, did you try to submit it in the long
queue/partition with -p long
?
If I read this correctly :
[emorice@clust-slurm-client ~]$ sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
fast* up 1-00:00:00 3 mix cpu-node-[9,13-14]
fast* up 1-00:00:00 54 idle cpu-node-[6-8,10-12,15-62]
long up 30-00:00:0 2 mix cpu-node-[13-14]
long up 30-00:00:0 19 idle cpu-node-[10-12,15-30]
bigmem up 60-00:00:0 1 mix cpu-node-69
training up 30-00:00:0 5 idle cpu-node-[1-5]
maintenance up 30-00:00:0 13 drain cpu-node-[70-74,76-83]
maintenance up 30-00:00:0 1 idle cpu-node-75
default partition is fast
with a default limit of 1 day while long
has a limit of 30 (I am not familiar of slurm nor have tested it yet, this just what I understand)
Also, I believe the purpose of --time
is to force a shorter time limit than the queue/partition default (i.e one wants to run a job of unknown length but have it killed if does not finish in, say, one hour) but does not allow a longer one.
It did not work but I found a solution :
I put this line in the header of my script, so I guess now it is up to 30 days :)
Thank you,
Floriane
10 days for a psiblast?! That's sounds a bit crazy... is it because you launch all queries, one after the other? Alternatively you could launch N jobs in parallel for N queries...?
Can I run 400 jobs at the same time on the cluster ? 1 query against uniref50 takes 40min to run for psiblast
Well, probably the 400 jobs are not going to run all at the same time... But you can submit your 400 jobs independently to the cluster queue, and at least some of them will run in parallel. The interest of using the cluster is to be able to run jobs on several CPUs at the same time! (I'll ask the IFB support team if there's a cleverer way to submit the 400 queries)
Great, thank you !
I found some advices on the internet, but let me know what the IFB support team advices, I do not want to break the cluster....
Ok, so, in case you have any question regarding the usage of the cluster, you can post it here: https://community.cluster.france-bioinformatique.fr.
For this particular problem, you should try and use Slurm's job array mode. You can find the full documentation about it here: https://slurm.schedmd.com/job_array.html.
Here is an example of a job array launching 30 fastqc on 30 different sequences : https://ifb-elixirfr.gitlab.io/cluster/trainings/slurm/ebai2019.html#56. This should be pretty similar to what you want to do.
When sbatch sees the "array" option, it launches the job as many times as values in the indicated array (for instance from 0 to 400). In each job, a variable with the list of the files to be analyzed is loaded. Then the treatment is launched on one of the file using the environment variable $SLURM_ARRAY_TASK_ID which takes as value the index of the current job (0, 1, 2, 3 etc.until 400).
Hello everyone,
I got my psiblast job cancelled due to time limit with the following error :
slurmstepd: error: JOB 3162130 ON cpu-node-7 CANCELLED AT 2019-11-08T15:32:30 DUE TO TIME LIMIT
Because I am doing PSIBLAST against Uniref50 it will take about more than 10 days to run.
What is the time limit ?
Can I change it with this / Do i have to specify it in order for it to work:
SBATCH --time=20-24:00:00 # days-hh:mm:ss
Bon weekend, Floriane