Closed pandeydeep9 closed 2 years ago
Actually I was just occupied by another project, so I had not solved this but closed the issue. Maybe the problem does exist.
Thanks for raising the issue @pandeydeep9 and @Phoveran,
These leaks are worrisome. Could you share more about your setup? Which GPU, CPU, and versions of Python, PyTorch, and learn2learn? It seems to be hardward-dependent since @nightlessbaron wasn’t able to reproduce the bug on Colab. Also, are you running the mini-imagenet script as-is?
I reduced the meta_batch_size
parameter to 2 and shots
to 1. That is the only change I made in the example mini-imagenet script.
My CPU is Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
. My GPU is TU106M [GeForce RTX 2060 Mobile]
, version: a1, clock 33MHz.
I tried on Python 3.8.5
, learn2learn version 0.1.5
, and torch version is '1.10.0+cu102'
I have the same issue, even I run "maml_miniimagenet.py" on A5000 with 24GB After a few iterations, It gives the following error message "CUDA out of memory" I try many version 0.1.3, 0.1.4, 0.1.5, 0.1.6 So I think that if I use this library for my project, I will face much trouble in the future. Can you give me some advice or how to fix it? Many Thanks
Thanks for the additional feedback. Are you also using PyTorch v.1.10? And does commenting out the validation step also fix the memory leak?
Yes, I use Pytorch 1.10.0, CUDA Version: 11.2, python 3.8.12,
My setting:
python: 3.9.7 learn2learn: 0.1.6 (using pip install learn2learn) PyTorch: 1.10 CUDA: 11.4 GPU: RTX 2080Ti
Thanks for the extra info, I wonder if the issue cropped up with PyTorch 1.10 on CUDA 11+. As a temporary fix, does changing learner = maml.clone()
to learner = maml.clone(first_order=True)
on l. 112 solve the leak for you?
Yes, adding first_order=True on l. 112 solves the leak problem. Also, I guess this should give the expected results as I believe we can use the first order MAML during the validation/test phases and get the same results (i.e. do not need to track gradients for MAML during test/validation phases).
Thanks
We meet the same case and our setting is pytorch 1.10.0, python 3.9.5, cuda 11.5, tesla m40(24G). We are glad to see this issue published since we debug our code repeatedly and have no idea what's causing the increasing cuda memory occupation over val or test iterations.
I was facing the same issue, but managed to solve it by downgrading Pytorch from version 1.10 to 1.9. I was using the following setup:
learn2learn 0.1.6 Python 3.8.6 Pytorch 1.10 GPU: Nvidia V100 (32gb) Cuda 10.2
Using this setup, memory usage kept increasing over epochs until an out-of-memory error occurred. However, when using Pytorch 1.9, memory usage stabilizes.
Thanks for the extra info, I wonder if the issue cropped up with PyTorch 1.10 on CUDA 11+. As a temporary fix, does changing
learner = maml.clone()
tolearner = maml.clone(first_order=True)
on l. 112 solve the leak for you?
Honestly, I tried to use maml for finetuning T5 transformer, befor adding "first_order=True", I just could run 2 tps, however, this way couldn't fix my problem. After adding this parameter, I could run 4 tps, but still got memory leak. I gues there are still some problems and exposed by huge networks such as transformer.
learn2learn 0.1.6 Python 3.9 Pytorch 1.10 GPU: 3080 (24GB) Cuda 10.2
The memory leak seems to have been introduced in PyTorch 1.10. @sjtugzx do you also see leaks with T5 on PyTorch 1.9?
I haven't had time to investigate it yet, so help is welcome.
I have a suggestion for a potential fix. It is a little bit hacky though. In my observations, the key problem leading to the memory leak seems to be that the compute graph for the gradient update is being created, even when first_order=True
. During training, I think the memory doesn't accumulate because the compute graph gets flushed when you do loss.backward()
. However, at evaluation time, you never need to call loss.backward()
, so there's a possibility the memory usage scales wildly.
In my code, what I've done to get rid of this extra unneeded memory usage at evaluation time is to add a eval
flag to the adapt
function inside MAML and MetaSGD which causes the gradient update to be wrapped in a no_grad
context, so
# Update the module
self.module = maml_update(self.module, self.lr, gradients)
becomes
# Update the module
if eval:
with torch.no_grad():
self.module = maml_update(self.module, self.lr, gradients)
for p in self.module.parameters():
p.requires_grad = True
else:
self.module = maml_update(self.module, self.lr, gradients)
I haven't investigated this in detail so I'm not sure if this is the best way to proceed, but let me know if this seems promising and if I should investigate further, and maybe even make a pull request.
For people following, @kzhang2 and I have been discussing on slack and we came up with a fix. Expect a PR + release in the next 2 weeks. Meanwhile, the fix is to update the update_module
function in learn2learn/utils/__init__.py
as follows:
def update_module(module, updates=None, memo=None):
r"""
[[Source]](https://github.com/learnables/learn2learn/blob/master/learn2learn/utils.py)
**Description**
Updates the parameters of a module in-place, in a way that preserves differentiability.
The parameters of the module are swapped with their update values, according to:
\[
p \gets p + u,
\]
where \(p\) is the parameter, and \(u\) is its corresponding update.
**Arguments**
* **module** (Module) - The module to update.
* **updates** (list, *optional*, default=None) - A list of gradients for each parameter
of the model. If None, will use the tensors in .update attributes.
**Example**
~~~python
error = loss(model(X), y)
grads = torch.autograd.grad(
error,
model.parameters(),
create_graph=True,
)
updates = [-lr * g for g in grads]
l2l.update_module(model, updates=updates)
~~~
"""
if memo is None:
memo = {}
if updates is not None:
params = list(module.parameters())
if not len(updates) == len(list(params)):
msg = 'WARNING:update_module(): Parameters and updates have different length. ('
msg += str(len(params)) + ' vs ' + str(len(updates)) + ')'
print(msg)
for p, g in zip(params, updates):
p.update = g
# Update the params
for param_key in module._parameters:
p = module._parameters[param_key]
if p is not None and hasattr(p, 'update') and p.update is not None:
if p in memo:
module._parameters[param_key] = memo[p]
else:
updated = p + p.update
p.update = None
memo[p] = updated
module._parameters[param_key] = updated
# Second, handle the buffers if necessary
for buffer_key in module._buffers:
buff = module._buffers[buffer_key]
if buff is not None and hasattr(buff, 'update') and buff.update is not None:
if buff in memo:
module._buffers[buffer_key] = memo[buff]
else:
updated = buff + buff.update
buff.update = None
memo[buff] = updated
module._buffers[buffer_key] = updated
# Then, recurse for each submodule
for module_key in module._modules:
module._modules[module_key] = update_module(
module._modules[module_key],
updates=None,
memo=memo,
)
# Finally, rebuild the flattened parameters for RNNs
# See this issue for more details:
# https://github.com/learnables/learn2learn/issues/139
if hasattr(module, 'flatten_parameters'):
module._apply(lambda x: x)
return module
Quick update: this is fixed, tested, and available in the new v0.1.7 release.
Hi, I am using learn2learn and getting memory leak error. This is the code I am using:
#Load model weights
model.load_state_dict(torch.load('mnist_model_weights_450.pth', map_location={'cuda:2' : 'cuda:0'}))
# run the test data
meta_test_loss = 0.0
for idx, (context_x, context_y, target_x, target_y) in enumerate(test_loader):
context_x, context_y, target_x, target_y = context_x.to(device), context_y.to(device), target_x.to(device), target_y.to(device)
effective_batch_size = context_x.size(0)
for i in range(effective_batch_size):
learner = maml.clone(first_order=True)
x_support, y_support = context_x[i], context_y[i]
x_query, y_query = target_x[i], target_y[i]
y_support = y_support.view(-1)
y_query = y_query.view(-1)
for _ in range(num_epochs):
wts, predictions = learner(x_support)
loss = custom_loss_function(predictions, y_support, wts)
learner.adapt(loss)
wts, predictions = learner(x_query)
loss = custom_loss_function(predictions, y_query, wts)
meta_test_loss += loss
meta_test_loss /= effective_batch_size
if idx % 10 == 0:
print(f"Iteration: {idx+1}, Meta test loss: {meta_test_loss}")
print(f"Final Meta test loss: {meta_test_loss}")
I am getting this error:
OutOfMemoryError: CUDA out of memory. Tried to allocate 2.00 MiB. GPU 0 has a total capacity of 14.75 GiB of which 3.06 MiB is free. Process 54265 has 14.74 GiB memory in use. Of the allocated memory 14.51 GiB is allocated by PyTorch, and 102.21 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management
learn2learn 0.2.0 Python 3.9 Pytorch 2.4.0+cu121 (using google colab) GPU: T4 (15GB) Cuda 12.2
Can anyone tell me how to fix it? I wrote the training loop similiarly but it runs
I installed learn2learn using "pip install learn2learn". When I try to run maml_miniimagenet.py (from learn2learn/examples/vision/maml_miniimagenet.py ) with a batch size of 2 and shot = 1, I get the same error after 63 iterations. When I change to shot = 5, I get the error after 3 iterations.
When I look at nvidia-smi, the memory usage gradually increases with each iteration. However, If I comment out the meta-validation loss part, (line 114-112 in this script) then I don't get the memory leak problem. I think the issue is similar to (Potential Memory Leak #278 ) I wonder why this issue is and how the issue can be solved?