CPU memory increases after every step

ashutoshpndy commented 1 year ago

Thank you for your significant contributions to Vision and Language Navigation.

I've been utilizing the bash pretrain_src/scripts/pretrain_r2r.bash script to pre-train the given 9 tasks. However, I've noticed a consistent rise in CPU memory consumption with each training iteration. By the time it reaches around 70,000 steps, it depletes my CPU's 128 GB memory, resulting in an Out Of Memory (OOM) error. It's worth noting that while I'm training on GPU devices, the OOM issue is occurring with my CPU memory.

Could you provide any insights or potential solutions to this problem? I eagerly await your guidance.

Thank you.

jialuli-luka commented 1 year ago

Hi,

To make pre-training faster, we store image features in CPU memory. Besides, pre-fetching images also take CPU memory. The CPU memory usage is related to num_workers you use. Using smaller num_workers will use less CPU memory, but the pre-training speed will be much slower, or you might want to not store image features in CPU memory to save some memory.

Best, Jialu

ashutoshpndy commented 1 year ago

Hi, Jialuli Thank you so much for the reply and sharing insights regarding per-training optimization.

jialuli-luka / VLN-SIG

CPU memory increases after every step #5