neurodata / ndgrutedb

Pipeline and tools for estimating human connectomes from Diffusion, Structural, and Functional MRI
http://neurodata.io
Apache License 2.0
29 stars 13 forks source link

-l virtual_free flag not working on bc1 #174

Closed gkiar closed 8 years ago

gkiar commented 8 years ago

Jobs are being queued in excess to the expected ram usage. (i.e. only 8x 64G jobs should be launched at a time, but instead it's launching the maximum loni is allowed).

@alexbaden

alexbaden commented 8 years ago

Try h_vmem instead. On Mon, Nov 16, 2015 at 09:06 Greg Kiar notifications@github.com wrote:

Assigned #174 https://github.com/openconnectome/m2g/issues/174 to @alexbaden https://github.com/alexbaden.

— Reply to this email directly or view it on GitHub https://github.com/openconnectome/m2g/issues/174#event-465442106.

gkiar commented 8 years ago

Doesn't h_vmem just kill a job if it exceeds the amount? What I want to happen, and virtual_free is supposed to do, is instead just not submit jobs if the given amount of ram is not available/accounted for in other jobs in the queue.

alexbaden commented 8 years ago

We use h_vmem as the consumable memory resource. There is a script that kills jobs that exceed the total h_vmem limit, but that's separate from submitting jobs to the scheduler. It should queue jobs and respect the memory limits with h_vmem. Try it! On Mon, Nov 16, 2015 at 10:00 Greg Kiar notifications@github.com wrote:

Doesn't h_vmem just kill a job if it exceeds the amount? What I want to happen, and virtual_free is supposed to do, is instead just not submit jobs if the given amount of ram is not available/accounted for in other jobs in the queue.

— Reply to this email directly or view it on GitHub https://github.com/openconnectome/m2g/issues/174#issuecomment-157053024.

gkiar commented 8 years ago

Trying now, will report back! :) thanks!!

gkiar commented 8 years ago

@alexbaden neither one works, sadly

gkiar commented 8 years ago

single-threading camino as well as @alexbaden putting a kill command in sge directly avoids this being an issue; closing.