Closed cmiller2air closed 5 years ago
@cmiller2air lets see. Rectangular training attempts to sort all of your training images by aspect ratio, and then groups similar aspect ratio images into a single batch. It will then pad all of the images in the batch as necessary to achieve a minimum pad given the constraint that the image dimensions must be multiples of 32.
I'm not sure if rectangular training and multi scale are mutually exclusive, though one important problem with rectangular training is that you need to be sure that the images are not shuffled by the data loader.
test.py uses rectangular images by default in this repo, so the rectangular functionality is operating properly by itself. Can you post your command and the screen outputs of the command?
I took a look at this over here. It seems like the interpolation operation in train.py is resizing the long size to a new multiple of 32, though the new image is no longer subject to the 32-multiple constraint on the shorter side. This is what is causing the errors.
This will need some significant additional logic to correct. We will leave this issue open. I can't give you a timeline for a fix, but an immediate workaround is to use rectangular training with --single-scale
.
I have experienced the same problem. I cannot use --rect
flag together with --multi-scale
when I try to do this I got belowing error:
Traceback (most recent call last):
File "train.py", line 334, in <module>
accumulate=opt.accumulate)
File "train.py", line 207, in train
pred = model(imgs)
File "/home/tomekb/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
result = self.forward(*input, **kwargs)
File "/home/tomekb/pytorch_yolov3/models.py", line 189, in forward
x = torch.cat([layer_outputs[i] for i in layer_i], 1)
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 1. Got 27 and 28 in dimension 2 at /pytorch/aten/src/THC/generic/THCTensorMath.cu:71
@Bienqq @cmiller2air since we now had multiple requests we elevated the issue status and have now implemented a fix in https://github.com/ultralytics/yolov3/commit/44b340321fef16ee21e9fba43caa7ddebef03c2f. Multiscale training should now be compatible with rectangular training.
Please git pull
and try again. Thank you!
Everything appears to be operating correctly in our COCO tests:
python3 train.py --data data/coco.data --img-size 320 --rect --multi-scale
Namespace(accumulate=4, batch_size=16, bucket='', cfg='cfg/yolov3-spp.cfg', data='data/coco.data', epochs=100, evolve=False, img_size=320, multi_scale=True, nosave=False, notest=False, num_workers=4, rect=True, resume=False, transfer=False, var=0, xywh=False)
Using CUDA with Apex device0 _CudaDeviceProperties(name='Tesla T4', total_memory=15079MB)
Reading image shapes: 100% 117263/117263 [03:24<00:00, 572.30it/s]
Model Summary: 225 layers, 6.29987e+07 parameters, 6.29987e+07 gradients
Epoch gpu_mem GIoU/xy wh obj cls total targets img_size
0/99 3.7G 0.565 0 10.6 13.2 24.4 93 352: 4% 297/7329 [01:55<33:54, 3.46it/s]
Closing as issue should be resolved.
Describe the bug I used 1024*288 images as a training image, using four 1080ti gpus. I tried to enable both rectangular_training and multi-scale_learning but tensor size mismatch occurred on route modules. The error seems to occur because the input image doesn't have multiples of 32. It seems the images are resized and padded to have multiples of 32, but they are changed by resizing part of multi-scale module. Switching the location of the two modules may fix the problem. Or using estimated width and height to scale the image instead of using resizing factor would help too.