Confused about network's input size support, img_size in the code, and "rectangular training"?

ultralytics / yolov3

YOLOv3 in PyTorch > ONNX > CoreML > TFLite

GNU Affero General Public License v3.0

10.25k stars 3.45k forks source link

First, thanks for the great repo with valuable commented code + support. I'd like to train a network that matches the aspect ratio of a 1080p resolution. Let's say 1056x608 resolution is close to 1080p aspect ratio. I want to train a network from scratch with that resolution. I have a few questions around that: 1) Does this repo support training with network sizes that are not square? (Rectangular network size?) 2) If so, what is img_size in the code? Is it always a square image fed to the network in the inference step? 3) Is so called "rectangular training" support coming in v7 related to training non-square network sizes, or is it just an optimization around rectangular images in dataset during training? 4) If I change number of filters of convolutional layers in cfg file, will this repo correctly initialize weights? It looks like it always initializes with darknet's already trained weights file. Best

@furkankirac yes you can optionally do rectangular training in train.py. Rectangular inference is already on by default in test.py and detect.py, no changes needed. See https://github.com/ultralytics/yolov3/issues/232

img_size represents the longest dimension if rectangular inference is used. You can leave a new model initialized with random weights rather than a backbone (i.e. darknet53) if you would like to structure your own network. darknet53 is used as the backbone to help start training on more normal yolov3 variants like yolov3-spp etc.

To detect with rectangular inference simply run python3 detect.py. To train with rectangular inference set the flag in the code here and run python3 train.py etc. (git pull to get the latest). https://github.com/ultralytics/yolov3/blob/bb3682024efb6ecde7de937d427419b989763b22/train.py#L142-L149

ultralytics / yolov3

Confused about network's input size support, img_size in the code, and "rectangular training"? #332