Closed batu closed 4 years ago
Unity environment consumes much more memory. Another verification you could make is to run an Atari environment and compare - this should not exhaust your memory. What's the maximum size that the RAM grows to, and the maximum RAM on your machine?
I have 16 gb (+1 gb of swap) on my machine, and the python process gradually takes all of it.
The annoying bit is that I can see the python process increasing in memory, not the Unity environment-- initially, I also had thought it was the Unity side of things.
Checking atari is a good idea. Thank you!
Filename: /home/batu/Desktop/The_Agency/Libraries/SLM-Lab/slm_lab/agent/algorithm/ppo.py
Line # Mem usage Increment Line Contents
================================================
171 502.4 MiB 502.4 MiB @profile
172 def train(self):
173 502.4 MiB 0.0 MiB if util.in_eval_lab_modes():
174 return np.nan
175 502.4 MiB 0.0 MiB clock = self.body.env.clock
176 502.4 MiB 0.0 MiB if self.to_train == 1:
177 502.4 MiB 0.0 MiB net_util.copy(self.net, self.old_net) # update old net
178 512.4 MiB 10.0 MiB batch = self.sample()
179 512.4 MiB 0.0 MiB clock.set_batch_size(len(batch))
180 609.1 MiB 96.7 MiB _pdparams, v_preds = self.calc_pdparam_v(batch)
181 611.4 MiB 2.3 MiB advs, v_targets = self.calc_advs_v_targets(batch, v_preds)
182 # piggy back on batch, but remember to not pack or unpack
183 611.4 MiB 0.0 MiB batch['advs'], batch['v_targets'] = advs, v_targets
184 611.4 MiB 0.0 MiB if self.body.env.is_venv: # unpack if venv for minibatch sampling
185 611.4 MiB 0.0 MiB for k, v in batch.items():
186 611.4 MiB 0.0 MiB if k not in ('advs', 'v_targets'):
187 611.4 MiB 0.0 MiB batch[k] = math_util.venv_unpack(v)
188 611.4 MiB 0.0 MiB total_loss = torch.tensor(0.0)
189 1458.7 MiB 0.0 MiB for _ in range(self.training_epoch):
190 1402.4 MiB 0.8 MiB minibatches = util.split_minibatch(batch, self.minibatch_size)
191 1458.7 MiB 0.0 MiB for minibatch in minibatches:
192 1444.6 MiB 0.0 MiB if self.body.env.is_venv: # re-pack to restore proper shape
193 1444.6 MiB 0.0 MiB for k, v in minibatch.items():
194 1444.6 MiB 0.0 MiB if k not in ('advs', 'v_targets'):
195 1444.6 MiB 0.0 MiB minibatch[k] = math_util.venv_pack(v, self.body.env.num_envs)
196 1444.6 MiB 0.0 MiB advs, v_targets = minibatch['advs'], minibatch['v_targets']
197 1448.7 MiB 6.3 MiB pdparams, v_preds = self.calc_pdparam_v(minibatch)
198 1457.7 MiB 9.7 MiB policy_loss = self.calc_policy_loss(minibatch, pdparams, advs) # from actor
199 1457.7 MiB 0.0 MiB val_loss = self.calc_val_loss(v_preds, v_targets) # from critic
200 1457.7 MiB 0.0 MiB if self.shared: # shared network
201 loss = policy_loss + val_loss
202 self.net.train_step(loss, self.optim, self.lr_scheduler, clock=clock, global_net=self.global_net)
203 else:
204 1458.7 MiB 23.7 MiB self.net.train_step(policy_loss, self.optim, self.lr_scheduler, clock=clock, global_net=self.global_net)
205 1458.7 MiB 4.2 MiB self.critic_net.train_step(val_loss, self.critic_optim, self.critic_lr_scheduler, clock=clock, global_net=self.global_critic_net)
206 1458.7 MiB 0.0 MiB loss = policy_loss + val_loss
207 1458.7 MiB 0.0 MiB total_loss += loss
208 1458.7 MiB 0.0 MiB loss = total_loss / self.training_epoch / len(minibatches)
209 # reset
210 1458.7 MiB 0.0 MiB self.to_train = 0
211 1458.7 MiB 0.0 MiB logger.debug(f'Trained {self.name} at epi: {clock.epi}, frame: {clock.frame}, t: {clock.t}, total_reward so far: {self.body.env.total_reward}, loss: {loss:g}')
212 1458.7 MiB 0.0 MiB return loss.item()
213 else:
214 return np.nan
Something very weird happens inline 188. (this is not at every iteration of course.) This is the train function of PPO.
The memory usage spikes, but obviously as a side effect of either the tensor allocation or the loop. Do you have any guesses as to what the sideeffect might be?
PPO is on-policy, so the replay buffer size is capped. In fact you will see rise and fall of RAM consumption for PPO before and after training when it clears the replay buffer. Can u also share the spec you're using? If the time horizon, num_envs and batch size are big, and when coupled with state-processing (stacked/sequential states), they will take up RAM as well.
Note that line 188's increment is actually 0MB. Line 189 actually shows the memory gain from the entire for-loop, not from line 188 of tensor assignment. This number matches up with the end of the for loop at line 207, which is 1458.7MB
. So, nothing abnormal here.
closing due to inactivity, and the memory leak doesn't actually seem to be happening.
Hello,
I am currently using SLM lab as the learning component of my custom Unity environments. I am using a modified UnityEnv wrapper and I run my experiments using a modified version of the starter code here.
When I am running both PPO and SAC I realized that my Unix kernel kills the job after a while due running out of memory (RAM/Swap).
Given the custom nature of this bug, I don't expect you to replicate it, but rather, asking if you had ever faced a similar problem on your end.
Some more detail: 1) Initially, I assumed it was due to the size of the replay buffer. But even after the replay buffer was capped up a small number (1000) and got maxed out the problem persisted. 2) The memory increase is roughly on the order of 1mb/s which is relatively high. 3) I managed to trace it to the "train step" in SAC. Can't trace if memory is created there, but when the training steps aren't taken, there is no problem. 4) I tested with the default Unity envs to ensure I didn't cause the problem with my custom env--this doesn't seem to be the cause. 5) We will be testing with the provided Cartpole env to see if the problem persists.
Any guidance or tips would be appreciated! And once again thank you for the great library!