Closed leginonbot closed 8 months ago
Original Redmine Comment Author Name: Sargis Dallakyan (@dallakyan) Original Date: 2014-04-28T15:14:49Z
I've looked to see where the default memory of 2gb is coming. This is set in myami queue on garibaldi:
garibaldi00 align/cl2d7> qstat -f -Q myami
Queue: myami
queue_type = Execution
total_jobs = 2
state_count = Transit:0 Queued:0 Held:0 Waiting:0 Running:2 Exiting:0
resources_max.cput = 200000:00:00
resources_max.mem = 9588gb
resources_max.ncpus = 1760
resources_max.nodect = 173
resources_max.walltime = 900:00:00
resources_default.cput = 01:00:00
resources_default.mem = 2gb <=====
resources_default.ncpus = 1
resources_default.nodect = 1
resources_default.nodes = 1
resources_default.walltime = 00:00:12
mtime = 1385248723
resources_assigned.mem = 45097156608b
resources_assigned.ncpus = 2
resources_assigned.nodect = 3
max_user_run = 200
keep_completed = 60
enabled = True
I found 'memorymax' => '47' line in myamiweb/config.php for garibaldi host. I'll read through the code to see how to use this to set the default for requested memory.
Original Redmine Comment Author Name: Sargis Dallakyan (@dallakyan) Original Date: 2014-06-13T18:43:19Z
Added Processing Host Parameters options to Xmipp 3 Clustering 2D Alignment as shown in the image below. !runXmipp3CL2D.py_Launcher.png!
Users can now specify the amount of memory needed (in gb) and it defaults to 47gb for garibaldi.
Original Redmine Comment Author Name: Melody Campbell (Melody Campbell) Original Date: 2014-06-13T21:06:04Z
Hi Sargis,
Thanks so much for coding this. It does make the job file correctly, which is awesome. The only thing is i'm not sure if the xmipp 3 version of cl2d works properly on garibaldi, i have been using the xmipp 2 cl2d. I have launched a job on garibaldi and we'll see if it runs or not, (which will take a while-- the garibaldi queue for big jobs might be a couple days....)
I'll keep you posted, Melody
Original Redmine Comment Author Name: Melody Campbell (Melody Campbell) Original Date: 2014-06-16T15:15:25Z
Hi Sargis,
Emily and I just tested Xmipp3. I tried to test it on garibaldi but it's not installed there. Emily tested it on guppy and it didn't upload properly-- however, the PBS script seemed to work correctly. Here is Emily's directory where she tried to run it: /ami/data15/appion/14jun11c/align/cl2d1
I think, however, that as Xmipp2 has been running to completion lately, we would all be really happy if you could add the "Processing Host Parameters" appion module for the xmipp2 version of cl2d because then we could all start using it immediately on Garibaldi.
Thanks so much, and if you have any questions I'm happy to clarify.
Cheers, Melody
Original Redmine Comment Author Name: David Veesler (David Veesler) Original Date: 2014-08-05T20:09:10Z
Appion now allows to choose the queue we want to submit to and adapts the number of processors requested per node accordingly (e.g. 8 procs/node on garibaldi).
However, it still defaults to 2gb/node for jobs submitted to garibaldi which is an issue. Ideally, we would like to be able to type in the requested memory.
Original Redmine Comment Author Name: Amber Herold (Amber Herold) Original Date: 2014-08-06T14:24:35Z
I can add in the mem parameter today.
Original Redmine Comment Author Name: Amber Herold (Amber Herold) Original Date: 2014-08-06T15:34:51Z
David, Go ahead and give this a try. Should be working on longboard today. Everywhere else tomorrow.
Original Redmine Comment Author Name: Melody Campbell (Melody Campbell) Original Date: 2014-08-06T21:00:00Z
Hi,
So i'm pretty sure cl2d2 will not work at all anymore.
On longboard/beta I get this error after the job is submitted:
!!! WARNING: could not create stack average, average.mrc ... Inserting CL2D Run into DB
lines= ['\tlibmpi.so.1 => /usr/lib64/libmpi.so.1 (0x0000003c67000000)\n', '\tlibmpi_cxx.so.1 => /usr/lib64/libmpi_cxx.so.1 (0x00007f4a85cb7000)\n']
/ami/data00/appion/14jul31e/align/cl2d15/alignedStack.hed
Traceback (most recent call last):
File "/opt/myamisnap/bin/runXmippCL2D.py", line 624, in
In this directory: /ami/data00/appion/14jul31e/align/cl2d15
And on cronus3/beta I get this error and it won't even submit ERROR in job submission. Check the cluster setup. Ensure the ,.appio.cfg configuration file is correct (http://emg.nysbc.org/redmine/projects/appion/wiki/Configure_appioncfg) Job type: partalign partalign ['runXmippCL2D.py', '--stack=116', '--lowpass=15', '--highpass=2000', '--num-part=1999', '--num-ref=20', '--bin=2', '--max-iter=15', '--nproc=32', '--fast', '--classical_multiref', '--correlation', '--commit', '--nodes=4', '--ppn=8', '--mem=180', '--walltime=240', '--cput=24000', '--queue=myami', '--description=data00', '--runname=cl2d2', '--rundir=/ami/data00/appion/14jul31e/align/cl2d2', '--projectid=414', '--expid=13758', '--jobtype=partalign', '--jobid=548']
ERROR in job submission. Check the cluster setup. Ensure the ,.appio.cfg configuration file is correct (http://emg.nysbc.org/redmine/projects/appion/wiki/Configure_appioncfg) Job type: partalign partalign ['/opt/myamisnap/bin/appion', 'runXmippCL2D.py', '--stack=116', '--lowpass=15', '--highpass=2000', '--num-part=1999', '--num-ref=20', '--bin=2', '--max-iter=15', '--nproc=14', '--fast', '--classical_multiref', '--correlation', '--commit', '--nodes=4', '--ppn=4', '--walltime=2', '--cput=200', '--description=test', '--runname=cl2d6', '--rundir=/ami/data15/appion/14jul31e/align/cl2d6', '--projectid=414', '--expid=13758', '--jobtype=partalign', '--jobid=551']
Original Redmine Comment Author Name: Amber Herold (Amber Herold) Original Date: 2014-08-06T22:18:13Z
Looks like the Cronus3 issue was related to data00 access, and the other error related to average.mrc was reported by David prior to me adding the processing host parameters, so lets continue discussion of that in #2880.
Author Name: Melody Campbell (Melody Campbell) Original Redmine Issue: 2744, https://emg.nysbc.org/redmine/issues/2744 Original Date: 2014-04-25 Original Assignee: Amber Herold
Hi,
Whenever a cl2d job is submitted to garibaldi, the default memory is always 2gb. This will always cause the run to fail as it is not adequate to run the alignments. What we have been doing is simply modifying the job file to the amount of memory we want (each garibaldi node as 47gb memory, so we usually just use, if n= nodes, 46*n gb of memory for the run.) With multiple new users in the lab, it would be great if this could be the default, as it's just another thing for them to have to try and troubleshoot if their job fails, as modifying job files is rather overwhelming for a first time user.
Alternatively, if we could just specify how much memory is needed when specifying the number of processors needed in the user interface it would be great too.
Thanks, Melody