Closed uyoung-jeong closed 1 year ago
For those who experience the 2nd issue,
The primary cause is that the model is overloaded by DataParallel, but the code is not designed for multi-gpu setting.
It seems that RGB channel of the images are somehow recognized as mini-batch inside pytorch DataParallel.
Therefore, you should modify 2 files:
In tools/valid.py
, replace image_resized = transforms(image_resized)
with image_resized = transforms(image_resized).unsqueeze(0)
In lib/models/cid.py
, replace images = [x['image'].unsqueeze(0).to(self.device) for x in batch_inputs]
with below:
if self.training:
images = [x['image'].unsqueeze(0).to(self.device) for x in batch_inputs]
else:
images = [x['image'].to(self.device) for x in batch_inputs]
For the 2nd issue, It may work if you modify the configuration files: DDP from False to True.
Yes. Setting DDP True will work fine. It requires a bit of work to run with DP setting.
Hi, I tried to run validation script on coco dataset with following command, but it raises errors.
python tools/valid.py --cfg experiments/coco.yaml --gpus 0,1 TEST.MODEL_FILE model/coco/model_best.pth.tar
I'm using Ubuntu 20.04 with pytorch 1.12, cuda 11.3RuntimeError: Cannot re-initialize CUDA in forked subprocess. I fixed the error by adding
torch.multiprocessing.set_start_method('spawn')
before callingmake_test_dataloader
RuntimeError: Given groups=1, weight of size [64, 3, 3, 3], expected input[2, 1, 512, 832] to have 3 channels, but got 1 channels instead This error occurs at
lib/models/backbone.py", line 362, in forward. x = self.conv1(x)
It seems that network architecture is different from what it should be.