mjkwon2021 / CAT-Net

Official code for CAT-Net: Compression Artifact Tracing Network. Image manipulation detection and localization.
210 stars 25 forks source link

Memory leak when training #24

Closed zamling closed 2 years ago

zamling commented 2 years ago

Hi, When I tried to train this model, I found there is a memory leak (CPU RAM) at loading image part. The memory increase (almost linear increasing) with script running. And I use the memory_profiler to check the place of memory leak, it shows that

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
    99   4743.2 MiB   4743.2 MiB           1   @profile
   100                                         def train_cls(config, epoch, num_epoch, epoch_iters, base_lr, num_iters,
   101                                                   trainloader, optimizer, model, writer_dict, final_output_dir):
   102                                             # Training
   103   4743.2 MiB      0.0 MiB           1       model.train()
   104   4743.2 MiB      0.0 MiB           1       batch_time = AverageMeter()
   105   4743.2 MiB      0.0 MiB           1       ave_loss = AverageMeter()
   106   4743.2 MiB      0.0 MiB           1       tic = time.time()
   107   4743.2 MiB      0.0 MiB           1       cur_iters = epoch * epoch_iters
   108   4743.2 MiB      0.0 MiB           1       writer = writer_dict['writer']
   109   4743.2 MiB      0.0 MiB           1       global_steps = writer_dict['train_global_steps']
   110   4743.2 MiB      0.0 MiB           1       world_size = get_world_size()
   111                                         
   112   5984.3 MiB -53718.8 MiB          41       for i_iter, (images, labels, qtable) in enumerate(trainloader):
   113                                                 # images, labels, _, _ = batch
   114   7328.2 MiB  55016.1 MiB          41           images = images.cuda()
   115   7328.3 MiB    -43.6 MiB          41           labels = labels.long().cuda()
   116                                         
   117   7328.3 MiB   1126.4 MiB          41           losses, _ = model(images, labels, qtable)  # _ : output of the model (see utils.py)
   118   7328.3 MiB     -0.4 MiB          41           loss = losses.mean()

It seems that the memory leak happens at images=images.cuda() When I set worker > 0 (worker=4), all of workers' memory increase (by top command)

DO you have any idea about the reason why there is a memory leak when data loading.

Thanks a lot! : )

CauchyComplete commented 2 years ago

Sorry I didn't notice this problem before. Since I just used a PyTorch loader, I don't think I can fix this.

wennyHou commented 1 year ago

jpegio may has a memory leak problem