Open chmxu opened 2 years ago
Also, when I use the default script for episodic training the usage of RAM increases dramatically during training. The model can use about 100G RAM after about 300 iterations. I don't know if this is reasonable.
Hi,
Thanks for raising this issue. Let me investigate on both problems and get back to you ASAP.
Update : 1) Could you try again and let me know if the RAM problem is solved ? 2) As for the resnet structure, there is indeed some discrepancy in the litterature between resnet18 (implemented in my code) and the custom resnet 12 used in several few-shot works. I will add the latter architecture soon.
Hi, thank you for your reply! I have tried to modify the training script based on your new version to skip the model forward and backward and only iterate the dataloader and print the memory usage as follow
import psutil
for i, data in enumerate(tqdm_bar):
if i >= args.num_updates:
break
print("PERCENTAGE RAM USED", psutil.virtual_memory().percent)
continue
In my trial the percentage of used memory keeps increasing. I think there may be some potential memory leakage when reading the tfrecord files but I cannot figure it out. My pytorch version is 1.9.0, with cuda 11.1. Maybe you can try my code to see if you can reproduce the problem.
I have tried my new code before pushing, and I had no memory leakage. When choosing my pytorch loader, the RAM capped at 16.5 GB. Please can you confirm that by running on my original code:
bash scripts/train.sh protonet resnet18 ilsvrc_2012
you don't have any leakage ? Thanks.
I re-clone the repo and trained the protonet with your original code. After 1400 iterations 23G RAM is used. When I train the model with 4 GPUs (by modifying the gpu configuration in base.yaml), about 80G RAM is used at 1100 iterations. And the usage keeps increasing slowly in both cases.
I assume the RAM used by the model is correlated with number of GPUs (since DDP is used) and size of an episode. In this way when the episode is large, which is exact the case in meta-dataset where the largest support set can contain 500 images, and when I want to use multiple GPUs, the code may use incredibly large RAM. I wonder if there is any solution to this problem.
Hi,
The original tensorflow implementation uses the standard structure for the first convolution layer, i.e., 7x7 kernel size, stride 2, padding 3 and a 3x3 max pooling layer after that (link) while in your implementation this layer is used with 3x3 kernel size and without max pooling (link). In this way the resulted feature map is way larger and costs more memory. I also notice that in the PAMI version of TIM the authors claim that the pytorch version of baselines are much better than the original version. I wonder if the performance boost comes from this modification. The 'larger' version of resnet seems not so practical for meta-dataset, since it will lead to OOM when being trained with the ProtoNet or other episodic methods. I don't know if I have any misunderstanding about the code.
Thanks.