mozilla / snakepit

Machine learning job scheduler
Mozilla Public License 2.0
51 stars 16 forks source link

Preventing allocation fragmentation #116

Open tilmankamp opened 6 years ago

tilmankamp commented 6 years ago

There are "small" jobs that allocate a fraction of the GPUs one node has and "big" jobs that profit from allocating many or all of the GPUs of one node. For efficiency reasons the allocation algorithm should try to keep small jobs together by packing them into the remaining GPU slots of partially allocated nodes.

kdavis-mozilla commented 6 years ago

What about other resources, for example memory? As an example, a job may require 1 GPU and 128GB of memory.

tilmankamp commented 6 years ago

If this is really required, we could introduce memory and CPUs as allocatable resources.