radical-collaboration / QCArchive

2 stars 0 forks source link

Memory aware scheduling and memory management #9

Open mturilli opened 6 years ago

mturilli commented 6 years ago

Evaluate the development effort for RADICAL to offer a memory aware scheduler specific to the use case. Compare this to the performance issues if we do not have this scheduler

andre-merzky commented 6 years ago

I expect a memory aware scheduler to be a clone of the DA scheduler really - the management of node memory is semantically no different from managing local data storage (from the scheduler perspective its just another node-local resource to be shared between CUs). If that assumption holds, the effort to implement this should be fairly small (less than a person-week).

mturilli commented 6 years ago

In the general case this would be needed so we should put this into the SRS

dgasmith commented 6 years ago

PSUtil: https://psutil.readthedocs.io/en/latest/ PyCPUInfo: https://pypi.org/project/py-cpuinfo/

vivek-bala commented 6 years ago

An important part of this activity will be the detection of available memory. Two high-level options: (i) predefined before runtime, and (ii) detection during runtime.

There might be benefits of using the detection module as used by the application. I agree psutil is a good candidate.

mturilli commented 6 years ago

We consider this a requirement and we will code it following the timeline we will decide.

mturilli commented 6 years ago

Ongoing

mturilli commented 5 years ago

Implemented a probe to collect information about memory availability on nodes. Ongoing implementation in RP scheduler. Planned prototype in Dec 13.

mturilli commented 5 years ago

Ongoing.

mturilli commented 5 years ago

Prototype ready for the 21st

vivek-bala commented 5 years ago

Hi everyone, I have added the memory aware scheduler in RP (PR https://github.com/radical-cybertools/radical.pilot/pull/1781, pending approval). If someone from MolSSI can push the resource config for the ark(?) machine and any pointers to how much memory is available on a node on that machine, it will be helpful. We can keep that config in the RP repository itself.