yxgeee / FD-GAN

[NeurIPS-2018] FD-GAN: Pose-guided Feature Distilling GAN for Robust Person Re-identification.
https://yxgeee.github.io/projects/fdgan.html
281 stars 80 forks source link

Run baseline test failed with GPU out of memory error #24

Closed alfredcs closed 5 years ago

alfredcs commented 5 years ago

It happened on 2xV100 GPUs each has 32G memory.

python baseline.py -b 256 -d market1501 -a resnet50 --evaluate --resume checkpoints/model_best.pth.tar

....

Traceback (most recent call last): File "baseline.py", line 200, in main(parser.parse_args()) File "baseline.py", line 117, in main top1, mAP = evaluator.evaluate(test_loader, dataset.query, dataset.gallery, rerank_topk=100, dataset=args.dataset) File "/FD-GAN/reid/evaluators.py", line 213, in evaluate query=query, topk_gallery=topk_gallery, rerank_topk=rerank_topk) File "/FD-GAN/reid/evaluators.py", line 31, in extract_embeddin/gs Variable(gallery_feature.cuda(), volatile=True)) File "/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call result = self.forward(*input, **kwargs) File "/FD-GAN/reid/models/embedding.py", line 27, in forward x = x1 - x2 RuntimeError: CUDA out of memory. Tried to allocate 1024.00 KiB (GPU 0; 31.72 GiB total capacity; 24.00 GiB already allocated; 1.62 MiB free; 6.66 GiB cached) Option is Test

heerduo commented 5 years ago

It happened on 2xV100 GPUs each has 32G memory.

python baseline.py -b 256 -d market1501 -a resnet50 --evaluate --resume checkpoints/model_best.pth.tar

....

Traceback (most recent call last): File "baseline.py", line 200, in main(parser.parse_args()) File "baseline.py", line 117, in main top1, mAP = evaluator.evaluate(test_loader, dataset.query, dataset.gallery, rerank_topk=100, dataset=args.dataset) File "/FD-GAN/reid/evaluators.py", line 213, in evaluate query=query, topk_gallery=topk_gallery, rerank_topk=rerank_topk) File "/FD-GAN/reid/evaluators.py", line 31, in extract_embeddin/gs Variable(gallery_feature.cuda(), volatile=True)) File "/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call result = self.forward(*input, **kwargs) File "/FD-GAN/reid/models/embedding.py", line 27, in forward x = x1 - x2 RuntimeError: CUDA out of memory. Tried to allocate 1024.00 KiB (GPU 0; 31.72 GiB total capacity; 24.00 GiB already allocated; 1.62 MiB free; 6.66 GiB cached) Option is Test

Hello, I encounter the same problem. Have you solved it?

alfredcs commented 5 years ago

Added one more GPU seems to help. No RCA yet.

yxgeee commented 5 years ago

Which version of pytorch do you use?

TekiLi commented 5 years ago

Hello, I encounter the same problem. Have you solved it? I use the version of pytorch is 0.4.1,tks!

yxgeee commented 5 years ago

Hello, I encounter the same problem. Have you solved it? I use the version of pytorch is 0.4.1,tks!

If you use PyTorch 0.4.1, please use with torch.no_grad(): in the inference stage.

zhangqi2188 commented 5 years ago

It happened on 2xV100 GPUs each has 32G memory.

python baseline.py -b 256 -d market1501 -a resnet50 --evaluate --resume checkpoints/model_best.pth.tar

.... Traceback (most recent call last): File "baseline.py", line 200, in main(parser.parse_args()) File "baseline.py", line 117, in main top1, mAP = evaluator.evaluate(test_loader, dataset.query, dataset.gallery, rerank_topk=100, dataset=args.dataset) File "/FD-GAN/reid/evaluators.py", line 213, in evaluate query=query, topk_gallery=topk_gallery, rerank_topk=rerank_topk) File "/FD-GAN/reid/evaluators.py", line 31, in extract_embeddin/gs Variable(gallery_feature.cuda(), volatile=True)) File "/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call result = self.forward(*input, **kwargs) File "/FD-GAN/reid/models/embedding.py", line 27, in forward x = x1 - x2 RuntimeError: CUDA out of memory. Tried to allocate 1024.00 KiB (GPU 0; 31.72 GiB total capacity; 24.00 GiB already allocated; 1.62 MiB free; 6.66 GiB cached) Option is Test

Hello, I encounter the same problem. Have you solved it?

Hello, I encounter the same problem too,need help very much.

yxgeee commented 5 years ago

When using PyTorch >= 0.4.0, please use with torch.no_grad(): in the inference stage before the for loop.

yunhyuck commented 5 years ago

PyTorch> = 0.4.0을 사용할 torch.no_grad():for루프 앞의 추론 단계에서 와 함께 사용하십시오 .

main.py and embedding.py Modify

And Then, You can reduce the batch size. 256 -> 32 or 64