liuqk3 / PUT

Paper 'Transformer based Pluralistic Image Completion with Reduced Information Loss' in TPAMI 2024 and 'Reduce Information Loss in Transformers for Pluralistic Image Inpainting' in CVPR2022
MIT License
173 stars 15 forks source link

How to improve the speed for evaluation #4

Closed LonglongaaaGo closed 2 years ago

LonglongaaaGo commented 2 years ago

Hi @liuqk3 ,

Thank you so much for your excellent work. Recently, I tried to evaluate the PUT. It processed 37 seconds for each image. For the Place2 dataset with 36500 images in validation, I have to wait for around 20 days for the evaluation. Could you give some advice for improving speed?

Thanks!

liuqk3 commented 2 years ago

Hi @LonglongaaaGo ,

Thanks for your intrests in our work. The inference speed in fact is one of the main drawbacks. I am also trying to improve the inference speed (and the quality). In our work, we following ICT and only use a subset of Places2 (called Naturalscene in this repo.) for training and only keep about 800 images for evaluation. Please refer naturalscenetrain.txt and naturalscenevalidation.txt for more details. All the file lists are provided by the author of ICT.

Once I have improved the inference speed and the quality. I shall push the code to this repo.

Thanks.

LonglongaaaGo commented 2 years ago

Hi @LonglongaaaGo ,

Thanks for your intrests in our work. The inference speed in fact is one of the main drawbacks. I am also trying to improve the inference speed (and the quality). In our work, we following ICT and only use a subset of Places2 (called Naturalscene in this repo.) for training and only keep about 800 images for evaluation. Please refer naturalscenetrain.txt and naturalscenevalidation.txt for more details. All the file lists are provided by the author of ICT.

Once I have improved the inference speed and the quality. I shall push the code to this repo.

Thanks.

Ok Thank you so much! By the way, do you encounter this problem when training?

getting : /home/longlong/longlong/run_dir/PUT/20220817134604/OUTPUT/Experiment/checkpoint/last.pth \n Start training \n Get lr 0.0 from base lr 0.0 with none warning: Unused key step_iteration while instantiating image_synthesis.engine.lr_scheduler.CosineAnnealingLRWithWarmup warning: Unused key step_iteration while instantiating image_synthesis.engine.lr_scheduler.CosineAnnealingLRWithWarmup Traceback (most recent call last): File "/home/longlong/longlong/run_dir/PUT/20220817134604/train_net.py", line 183, in main() File "/root/PUT/20220817134604/train_net.py", line 131, in main launch(main_worker, args.ngpus_per_node, args.num_node, args.node_rank, args.dist_url, args.backend, args=(args,)) File "/root/PUT/20220817134604/image_synthesis/distributed/launch.py", line 52, in launch fn(local_rank, *args) File "/root/PUT/20220817134604/rain_net.py", line 178, in main_worker solver.close() File "/root/PUT/20220817134604/image_synthesis/engine/solver.py", line 745, in close self.logger.close() File "/root/PUT/20220817134604/image_synthesis/engine/logger.py", line 88, in close self.tb_writer.close() AttributeError: 'NoneType' object has no attribute 'close' {'overall': {'trainable': '9.694Mb; 10.165M', 'non_trainable': '38.193Mb; 40.049M', 'total': '48.404Mb; 50.755M', 'buffer': '528.987Kb; 541.683K'}, 'encoder': {'trainable': '627.5Kb; 642.56K', 'non_trainable': '0b; 0', 'buffer': '0b; 0', 'total': '627.5Kb; 642.56K'}, 'decoder': {'trainable': '6.32Mb; 6.627M', 'non_trainable': '0b; 0', 'buffer': '0b; 0', 'total': '6.32Mb; 6.627M'}, 'quantize': {'trainable': '0b; 0', 'non_trainable': '0b; 0', 'buffer': '513.0Kb; 525.312K', 'total': '513.0Kb; 525.312K'}, 'quant_conv': {'trainable': '64.25Kb; 65.792K', 'non_trainable': '0b; 0', 'buffer': '0b; 0', 'total': '64.25Kb; 65.792K'}, 'post_quant_conv': {'trainable': '64.25Kb; 65.792K', 'non_trainable': '0b; 0', 'buffer': '0b; 0', 'total': '64.25Kb; 65.792K'}, 'loss': {'trainable': '2.636Mb; 2.764M', 'non_trainable': '38.193Mb; 40.049M', 'buffer': '15.987Kb; 16.371K', 'total': '40.845Mb; 42.829M'}, 'total': '48.407Mb; 50.759M'} Experiment: global rank 0: prepare solver done! load the best loss tensor(0.4870, device='cuda:0') Resume from OUTPUT/Experiment/checkpoint/last.pth load the best loss tensor(0.4870, device='cuda:0') Resume from /root/PUT/20220817134604/OUTPUT/Experiment/checkpoint/last.pth Experiment: global rank 0: start training...

The log indicates some problems here when I reload the checkpoint to continue training.

liuqk3 commented 2 years ago

I found that the logger try to close tensorboard writer when tensorboard is not not used. And I have push the updates to fix this bug.

LonglongaaaGo commented 2 years ago

I found that the logger try to close tensorboard writer when tensorboard is not not used. And I have push the updates to fix this bug.

OK, Thank you so much!