huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
132.44k stars 26.37k forks source link

Schedulers cause memory accumulation across folds in cross-validation? #1134

Closed JohnGiorgi closed 4 years ago

JohnGiorgi commented 5 years ago

❓ Questions & Help

I am facing a strange issue when using the schedulers available in this library within a cross-validation loop. Basically, in each fold, I initialize a new model, optimizer, and scheduler. GPU memory accumulates until I eventually get a CUDA out of memory issue.

The simplest example I could come up with to reproduce the error is:

import torch
from pytorch_transformers import WarmupConstantSchedule, WarmupCosineSchedule, WarmupLinearSchedule, WarmupCosineWithHardRestartsSchedule

# In my actual project, this is a for loop over the k-folds of k-fold cross-validation.
# In this example I use a while just to demonstrate the OOM error.
while True:
    net = torch.nn.Linear(10000, 10000)
    net = net.cuda()

    optimizer = torch.optim.Adam(net.parameters(), lr=1e-3)
    scheduler = WarmupCosineWithHardRestartsSchedule(optimizer, 1, 1000)

    # I also tried all the other schedulers. Same issue.
    # scheduler = WarmupConstantSchedule(optimizer, 1)
    # scheduler = WarmupCosineSchedule(optimizer, 1, 1000)
    # scheduler = WarmupLinearSchedule(optimizer, 1, 1000)

    del net, optimizer, scheduler

This will run until it (very quickly) uses up all 12GB on my Titan XP GPU. To make sure it was truly the initialization of the scheduler, I also tested

import torch
from pytorch_transformers import WarmupCosineWithHardRestartsSchedule

while True:
    net = torch.nn.Linear(10000, 10000)
    net = net.cuda()

    optimizer = torch.optim.Adam(net.parameters(), lr=1e-3)

    del net, optimizer

And did not see the memory accumulation or OOM error.

My question(s) is/are:

Thanks a lot.

TIANRENK commented 5 years ago

I am facing the same issue.When I use the WarmupLinearSchedule and the 7th epoch training , I get a CUDA out of memory issue

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

rlouf commented 4 years ago

Running import gc, thengc.collect() and emptying the GPU’s cache should solve the issue temporarily. See #1742