Closed zamling closed 2 years ago
Hi, When I tried to train this model, I found there is a memory leak (CPU RAM) at loading image part. The memory increase (almost linear increasing) with script running. And I use the memory_profiler to check the place of memory leak, it shows that
Line # Mem usage Increment Occurrences Line Contents ============================================================= 99 4743.2 MiB 4743.2 MiB 1 @profile 100 def train_cls(config, epoch, num_epoch, epoch_iters, base_lr, num_iters, 101 trainloader, optimizer, model, writer_dict, final_output_dir): 102 # Training 103 4743.2 MiB 0.0 MiB 1 model.train() 104 4743.2 MiB 0.0 MiB 1 batch_time = AverageMeter() 105 4743.2 MiB 0.0 MiB 1 ave_loss = AverageMeter() 106 4743.2 MiB 0.0 MiB 1 tic = time.time() 107 4743.2 MiB 0.0 MiB 1 cur_iters = epoch * epoch_iters 108 4743.2 MiB 0.0 MiB 1 writer = writer_dict['writer'] 109 4743.2 MiB 0.0 MiB 1 global_steps = writer_dict['train_global_steps'] 110 4743.2 MiB 0.0 MiB 1 world_size = get_world_size() 111 112 5984.3 MiB -53718.8 MiB 41 for i_iter, (images, labels, qtable) in enumerate(trainloader): 113 # images, labels, _, _ = batch 114 7328.2 MiB 55016.1 MiB 41 images = images.cuda() 115 7328.3 MiB -43.6 MiB 41 labels = labels.long().cuda() 116 117 7328.3 MiB 1126.4 MiB 41 losses, _ = model(images, labels, qtable) # _ : output of the model (see utils.py) 118 7328.3 MiB -0.4 MiB 41 loss = losses.mean()
It seems that the memory leak happens at images=images.cuda() When I set worker > 0 (worker=4), all of workers' memory increase (by top command)
images=images.cuda()
top
DO you have any idea about the reason why there is a memory leak when data loading.
Thanks a lot! : )
Sorry I didn't notice this problem before. Since I just used a PyTorch loader, I don't think I can fix this.
jpegio may has a memory leak problem
Hi, When I tried to train this model, I found there is a memory leak (CPU RAM) at loading image part. The memory increase (almost linear increasing) with script running. And I use the memory_profiler to check the place of memory leak, it shows that
It seems that the memory leak happens at
images=images.cuda()
When I set worker > 0 (worker=4), all of workers' memory increase (bytop
command)DO you have any idea about the reason why there is a memory leak when data loading.
Thanks a lot! : )