kengz / SLM-Lab

Modular Deep Reinforcement Learning framework in PyTorch. Companion library of the book "Foundations of Deep Reinforcement Learning".
https://slm-lab.gitbook.io/slm-lab/
MIT License
1.25k stars 264 forks source link

how can i reasonable used my memory #384

Closed lidongke closed 5 years ago

lidongke commented 5 years ago

Hi~ I see your src code : search.py ''' num_cpus = min(util.NUM_CPUS, meta_spec['max_session']) ''' if my spec, Hyberparameter like this: "num_envs": 20 "max_trial": 10 "max_session": 3

this wil running 6 trials at the same time, And 4 trials are pending,right? It seems that CPU resource is ok ,but there has 60 process , my CPU memory has it limit, it will possible occured out of memory error and crash.

Is there some measures to prevent this or if this should i watching the cpu memory by myself to manual control my process numbers in my spec?

Is there only consider about "max_session" but not "num_envs" is reasonable?

@kengz

kengz commented 5 years ago

pasted from #385

If i have 20 CPUS, 10 envs, 5 sessions, then it will starup 50 process with 20 CPUS, i consider it is not reasonable, am i right ? how can i fullly and reasonable use my CPU resource?

kengz commented 5 years ago

resource allocation with the search module works like so:

  1. ray detects how many CPUs you have, say 20 in total. This is the compute budget.
  2. resource setting u saw in search.py is per trial. If a trial has 3 sessions, it assigns 3 CPUs. So, it can assign 6 trials to utilize 18 CPUs within its budget.

All these assume that environments are very light on CPUs and RAM (memory). If your environment is requires more resources, you can limit the number of trials ran in parallel by changing the line with a multiplier to allocate more per trial:

# either scale by ur num envs if it dominates the resources
multiplier = ps.get(spec, 'env.0.num_envs') * 0.2
# or simply set a convenient number for your use case
multiplier = 4
num_cpus = min(util.NUM_CPUS, meta_spec['max_session'] * multiplier)

Then, a trial with 3 sessions will use 12 CPUs, and so only 1 trial gets ran at a time. This will allocate the budget more reasonably for your use case

lidongke commented 5 years ago

I advise u to expose "multiplier" to out , so that we can more reasonably use more envs that not light.

kengz commented 5 years ago

good suggestion. will do so hopefully sometime this weekend