Closed alec3010 closed 5 years ago
When using PyTorch >= 0.4.0, please use with torch.no_grad(): in the inference stage before the for loop.
Hi y'all
Thanks a lot. To make it work on pytorch 1.1.0, I also converted the loss-tensor to a scalar by the use of "tensor.item()" in "losses.update(loss.data.item(), targets.size(0))", as pytorch 1.1.0 does not treat scalars as tensors anymore. (Previous formulation: "losses.update(loss.data[0], targets.size(0))")
Furthermore, after using "with torch.no_grad()" on the train-function, the loss-tensor needs to be rewritten by the use of 'loss = Variable(loss, requires_grad=True)' before loss.backward is used, as loss.backward needs gradients to work.
Maybe those insights are self-evident for AI developers who are more experienced than I am. But I thought this info might make it easier for others like me so I decided to share them.
Best Regards
Hi yxgeee,
I am running this with pytorch 1.1 using the python 3.6 interpreter in ubuntu 16.04. The Machine I'm using has a 1080 Ti with 11GBs of Memory, so I believe it should work hardware-wise. The dataset is loaded correctly but I get the following error when i am trying to train the baseline model:
Traceback (most recent call last): File "baseline.py", line 201, in
main(parser.parse_args())
File "baseline.py", line 143, in main
trainer.train(epoch, train_loader, optimizer, base_lr=args.lr)
File "/home/qbiik/Alex/Algorithmen/FD-GAN/reid/trainers.py", line 32, in train
loss, prec1 = self._forward(inputs, targets)
File "/home/qbiik/Alex/Algorithmen/FD-GAN/reid/trainers.py", line 70, in forward
, _, outputs = self.model(inputs)
File "/home/qbiik/Alex/venv/FD-GAN/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(input, kwargs)
File "/home/qbiik/Alex/venv/FD-GAN/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 150, in forward
return self.module(*inputs[0], *kwargs[0])
File "/home/qbiik/Alex/venv/FD-GAN/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(input, kwargs)
File "/home/qbiik/Alex/Algorithmen/FD-GAN/reid/models/multi_branch.py", line 13, in forward
x1, x2 = self.base_model(x1), self.base_model(x2)
File "/home/qbiik/Alex/venv/FD-GAN/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, kwargs)
File "/home/qbiik/Alex/Algorithmen/FD-GAN/reid/models/resnet.py", line 69, in forward
x = module(x)
File "/home/qbiik/Alex/venv/FD-GAN/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, *kwargs)
File "/home/qbiik/Alex/venv/FD-GAN/lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward
input = module(input)
File "/home/qbiik/Alex/venv/FD-GAN/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(input, kwargs)
File "/home/qbiik/Alex/venv/FD-GAN/lib/python3.6/site-packages/torchvision/models/resnet.py", line 88, in forward
out = self.bn3(out)
File "/home/qbiik/Alex/venv/FD-GAN/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/home/qbiik/Alex/venv/FD-GAN/lib/python3.6/site-packages/torch/nn/modules/batchnorm.py", line 83, in forward
exponential_average_factor, self.eps)
File "/home/qbiik/Alex/venv/FD-GAN/lib/python3.6/site-packages/torch/nn/functional.py", line 1697, in batch_norm
training, momentum, eps, torch.backends.cudnn.enabled
RuntimeError: CUDA out of memory. Tried to allocate 256.00 MiB (GPU 0; 10.91 GiB total capacity; 10.03 GiB already allocated; 256.94 MiB free; 20.18 MiB cached)
If you find the time, I'd greatly appreciate your help. :)
Best Regards