leginon-org / leginon-redmine-archive

1 stars 0 forks source link

Change the default memory for Cl2d jobs submitted to garibaldi #2744

Closed leginonbot closed 8 months ago

leginonbot commented 8 months ago

Author Name: Melody Campbell (Melody Campbell) Original Redmine Issue: 2744, https://emg.nysbc.org/redmine/issues/2744 Original Date: 2014-04-25 Original Assignee: Amber Herold


Hi,

Whenever a cl2d job is submitted to garibaldi, the default memory is always 2gb. This will always cause the run to fail as it is not adequate to run the alignments. What we have been doing is simply modifying the job file to the amount of memory we want (each garibaldi node as 47gb memory, so we usually just use, if n= nodes, 46*n gb of memory for the run.) With multiple new users in the lab, it would be great if this could be the default, as it's just another thing for them to have to try and troubleshoot if their job fails, as modifying job files is rather overwhelming for a first time user.

Alternatively, if we could just specify how much memory is needed when specifying the number of processors needed in the user interface it would be great too.

Thanks, Melody

leginonbot commented 8 months ago

Original Redmine Comment Author Name: Sargis Dallakyan (@dallakyan) Original Date: 2014-04-28T15:14:49Z


I've looked to see where the default memory of 2gb is coming. This is set in myami queue on garibaldi:

garibaldi00 align/cl2d7> qstat -f -Q myami
Queue: myami
    queue_type = Execution
    total_jobs = 2
    state_count = Transit:0 Queued:0 Held:0 Waiting:0 Running:2 Exiting:0 
    resources_max.cput = 200000:00:00
    resources_max.mem = 9588gb
    resources_max.ncpus = 1760
    resources_max.nodect = 173
    resources_max.walltime = 900:00:00
    resources_default.cput = 01:00:00
    resources_default.mem = 2gb <=====
    resources_default.ncpus = 1
    resources_default.nodect = 1
    resources_default.nodes = 1
    resources_default.walltime = 00:00:12
    mtime = 1385248723
    resources_assigned.mem = 45097156608b
    resources_assigned.ncpus = 2
    resources_assigned.nodect = 3
    max_user_run = 200
    keep_completed = 60
    enabled = True

I found 'memorymax' => '47' line in myamiweb/config.php for garibaldi host. I'll read through the code to see how to use this to set the default for requested memory.

leginonbot commented 8 months ago

Original Redmine Comment Author Name: Sargis Dallakyan (@dallakyan) Original Date: 2014-06-13T18:43:19Z


Added Processing Host Parameters options to Xmipp 3 Clustering 2D Alignment as shown in the image below. !runXmipp3CL2D.py_Launcher.png!

Users can now specify the amount of memory needed (in gb) and it defaults to 47gb for garibaldi.

leginonbot commented 8 months ago

Original Redmine Comment Author Name: Melody Campbell (Melody Campbell) Original Date: 2014-06-13T21:06:04Z


Hi Sargis,

Thanks so much for coding this. It does make the job file correctly, which is awesome. The only thing is i'm not sure if the xmipp 3 version of cl2d works properly on garibaldi, i have been using the xmipp 2 cl2d. I have launched a job on garibaldi and we'll see if it runs or not, (which will take a while-- the garibaldi queue for big jobs might be a couple days....)

I'll keep you posted, Melody

leginonbot commented 8 months ago

Original Redmine Comment Author Name: Melody Campbell (Melody Campbell) Original Date: 2014-06-16T15:15:25Z


Hi Sargis,

Emily and I just tested Xmipp3. I tried to test it on garibaldi but it's not installed there. Emily tested it on guppy and it didn't upload properly-- however, the PBS script seemed to work correctly. Here is Emily's directory where she tried to run it: /ami/data15/appion/14jun11c/align/cl2d1

I think, however, that as Xmipp2 has been running to completion lately, we would all be really happy if you could add the "Processing Host Parameters" appion module for the xmipp2 version of cl2d because then we could all start using it immediately on Garibaldi.

Thanks so much, and if you have any questions I'm happy to clarify.

Cheers, Melody

leginonbot commented 8 months ago

Original Redmine Comment Author Name: David Veesler (David Veesler) Original Date: 2014-08-05T20:09:10Z


Appion now allows to choose the queue we want to submit to and adapts the number of processors requested per node accordingly (e.g. 8 procs/node on garibaldi).

However, it still defaults to 2gb/node for jobs submitted to garibaldi which is an issue. Ideally, we would like to be able to type in the requested memory.

leginonbot commented 8 months ago

Original Redmine Comment Author Name: Amber Herold (Amber Herold) Original Date: 2014-08-06T14:24:35Z


I can add in the mem parameter today.

leginonbot commented 8 months ago

Original Redmine Comment Author Name: Amber Herold (Amber Herold) Original Date: 2014-08-06T15:34:51Z


David, Go ahead and give this a try. Should be working on longboard today. Everywhere else tomorrow.

leginonbot commented 8 months ago

Original Redmine Comment Author Name: Melody Campbell (Melody Campbell) Original Date: 2014-08-06T21:00:00Z


Hi,

So i'm pretty sure cl2d2 will not work at all anymore.

On longboard/beta I get this error after the job is submitted:

!!! WARNING: could not create stack average, average.mrc ... Inserting CL2D Run into DB

lines= ['\tlibmpi.so.1 => /usr/lib64/libmpi.so.1 (0x0000003c67000000)\n', '\tlibmpi_cxx.so.1 => /usr/lib64/libmpi_cxx.so.1 (0x00007f4a85cb7000)\n'] /ami/data00/appion/14jul31e/align/cl2d15/alignedStack.hed Traceback (most recent call last): File "/opt/myamisnap/bin/runXmippCL2D.py", line 624, in cl2d.start() File "/opt/myamisnap/bin/runXmippCL2D.py", line 605, in start self.insertAlignStackRunIntoDatabase("alignedStack.hed") File "/opt/myamisnap/bin/runXmippCL2D.py", line 386, in insertAlignStackRunIntoDatabase apDisplay.printError("could not find average mrc file: "+avgmrcfile) File "/opt/myamisnap/lib/appionlib/apDisplay.py", line 65, in printError raise Exception, colorString("\n FATAL ERROR \n"+text+"\n\a","red") Exception: FATAL ERROR could not find average mrc file: /ami/data00/appion/14jul31e/align/cl2d15/average.mrc

In this directory: /ami/data00/appion/14jul31e/align/cl2d15


And on cronus3/beta I get this error and it won't even submit ERROR in job submission. Check the cluster setup. Ensure the ,.appio.cfg configuration file is correct (http://emg.nysbc.org/redmine/projects/appion/wiki/Configure_appioncfg) Job type: partalign partalign ['runXmippCL2D.py', '--stack=116', '--lowpass=15', '--highpass=2000', '--num-part=1999', '--num-ref=20', '--bin=2', '--max-iter=15', '--nproc=32', '--fast', '--classical_multiref', '--correlation', '--commit', '--nodes=4', '--ppn=8', '--mem=180', '--walltime=240', '--cput=24000', '--queue=myami', '--description=data00', '--runname=cl2d2', '--rundir=/ami/data00/appion/14jul31e/align/cl2d2', '--projectid=414', '--expid=13758', '--jobtype=partalign', '--jobid=548']

ERROR in job submission. Check the cluster setup. Ensure the ,.appio.cfg configuration file is correct (http://emg.nysbc.org/redmine/projects/appion/wiki/Configure_appioncfg) Job type: partalign partalign ['/opt/myamisnap/bin/appion', 'runXmippCL2D.py', '--stack=116', '--lowpass=15', '--highpass=2000', '--num-part=1999', '--num-ref=20', '--bin=2', '--max-iter=15', '--nproc=14', '--fast', '--classical_multiref', '--correlation', '--commit', '--nodes=4', '--ppn=4', '--walltime=2', '--cput=200', '--description=test', '--runname=cl2d6', '--rundir=/ami/data15/appion/14jul31e/align/cl2d6', '--projectid=414', '--expid=13758', '--jobtype=partalign', '--jobid=551']

leginonbot commented 8 months ago

Original Redmine Comment Author Name: Amber Herold (Amber Herold) Original Date: 2014-08-06T22:18:13Z


Looks like the Cronus3 issue was related to data00 access, and the other error related to average.mrc was reported by David prior to me adding the processing host parameters, so lets continue discussion of that in #2880.