Closed anslex closed 4 months ago
Hi,
just to make sure, you started training with
./build/bin/train --config configs/train_normalnet.ini --TrainParams.scene_names tt_train --TrainParams.batch_size 1 --TrainParams.inner_batch_size 1 --TrainParams.train_crop_size 256
?
(just asking as the error output is missing the argument names)
and you are using the tt_train scene from our supplemental?
Hi Linus,
Yes, exactly like that. Additionally, I have tried not to save images with every checkpoint save and/or to further reduce the crop size, but without success.
I am going to dive into the code to understand why there is an attempt to allocate an additional ~508.00 MiB on the checkpoint save.
Hello,
I have found that it is due to lt_vgg = loss_vgg->forward(x, target);
and so I have forced want_eval = false;
to ignore eval and test epochs
Hi, good work-around :) This might also apply for you: https://github.com/lfranke/TRIPS/issues/26#issuecomment-1956377052 Not using VGG reduces VRAM, however this will also impact quality
Hello,
Thank you for your project. I have a 8Gb RTX 4070. By any chance do you know how to limit the memory usage during training at checkpoint save?