hoffmangroup / segway

Application for semi-automated genomic annotation.
http://segway.hoffmanlab.org/
GNU General Public License v2.0
13 stars 7 forks source link

Segway Identify crashes on PBS/Torque cluster #86

Open EricR86 opened 7 years ago

EricR86 commented 7 years ago

Original report (BitBucket issue) by Gabriel Pratt (Bitbucket: gpratt).


Hello, I'm just filing this issue for tracking sake, don't worry about fixing it, I've already fixed it locally, we can discuss if its worth building out a fix in general.

On my PBS/Torque cluster memory managed by something called "cpuset". Which allocates memory to a job based on the number of processors requested for that job. This creates a conflict when Segway tries to specify memory requirements with the -l mem=XXGB and -l vmem=XXXGB. Causing Segway Identify to crash.

The crash occurs when segway tries to use too much memory, I'm not quite sure what too much means in this case, but this wasn't an issue with segway train as it wasn't as memory intensive.

I fixed the issue by removing all resource requests from segway/segway/cluster/common.py Specifically line 76 was changed to

#!python

self.res_req = []

This allowed cpuset to manage the memory without conflicts and segway executed successfully.

I'm guessing that those memory requirements are important for resource allocation in other clusters, the best way I can think of fixing this issue generally is just putting in a flag that removes resource requests on clusters that don't need them or can't support them.

EricR86 commented 7 years ago

Original comment by Gabriel Pratt (Bitbucket: gpratt).