radical-cybertools / radical.saga

A Light-Weight Access Layer for Distributed Computing Infrastructure and Reference Implementation of the SAGA Python Language Bindings.
http://radical-cybertools.github.io/saga-python/
Other
83 stars 34 forks source link

comet takes --gres for gpu #814

Closed lee212 closed 3 years ago

lee212 commented 3 years ago

This is to reflect slurm options for gpus, and addressing issue #813 according to https://portal.xsede.org/sdsc-comet#gpu.

mtitov commented 3 years ago

@lee212 please check this branch, it covers all cases (for gpu queue, and not for gpu-share queue). It was mentioned here

lee212 commented 3 years ago

@mtitov , check the code block for gpu counting. On gpu queue, you have to increase gpu counts by 4, a number of available gpu devices per node. For example, 1,2, and 3 will be failed on gpu queue but okay with gpu-shared queue.

mtitov commented 3 years ago

@lee212 yeah, just I didn't see it explicitly defined at user guide, but if you checked that then ok. Also don't forget to consider option when gpu_arch is not provided