Issues with validation process

uyoung-jeong commented 2 years ago

Hi, I tried to run validation script on coco dataset with following command, but it raises errors. python tools/valid.py --cfg experiments/coco.yaml --gpus 0,1 TEST.MODEL_FILE model/coco/model_best.pth.tar I'm using Ubuntu 20.04 with pytorch 1.12, cuda 11.3

RuntimeError: Cannot re-initialize CUDA in forked subprocess. I fixed the error by adding torch.multiprocessing.set_start_method('spawn') before calling make_test_dataloader
RuntimeError: Given groups=1, weight of size [64, 3, 3, 3], expected input[2, 1, 512, 832] to have 3 channels, but got 1 channels instead This error occurs at lib/models/backbone.py", line 362, in forward. x = self.conv1(x) It seems that network architecture is different from what it should be.

uyoung-jeong commented 2 years ago

For those who experience the 2nd issue, The primary cause is that the model is overloaded by DataParallel, but the code is not designed for multi-gpu setting. It seems that RGB channel of the images are somehow recognized as mini-batch inside pytorch DataParallel. Therefore, you should modify 2 files: In tools/valid.py, replace image_resized = transforms(image_resized) with image_resized = transforms(image_resized).unsqueeze(0) In lib/models/cid.py, replace images = [x['image'].unsqueeze(0).to(self.device) for x in batch_inputs] with below:

        if self.training:
            images = [x['image'].unsqueeze(0).to(self.device) for x in batch_inputs]
        else:
            images = [x['image'].to(self.device) for x in batch_inputs]

mlyangd commented 1 year ago

For the 2nd issue, It may work if you modify the configuration files: DDP from False to True.

uyoung-jeong commented 1 year ago

Yes. Setting DDP True will work fine. It requires a bit of work to run with DP setting.

kennethwdk / CID

Issues with validation process #2