How to replicate the efficient memory usage on a 24GB GPU with batch size 2?

I am trying to reproduce your results. However, when I run the default configuration autoencoder/base.yaml, my memory cost is super high.

Specifically, the memory cost of each batch when running with the default configuration is 27591 MiB. I would like to know how to modify the configuration or the training procedure in order to achieve a similar level of efficiency as reported in the paper, where a batch size of 2 was used on a 24GB GPU.

Can you provide guidance on how to modify the configuration or the training procedure to achieve this? Or, if this is not possible, can you explain why this is the case and suggest alternative approaches to reproducing the reported results?

Thank you in advance for your help.

sihyun-yu / PVDM

How to replicate the efficient memory usage on a 24GB GPU with batch size 2? #4