Input size can not be dynamic?

lucasjinreal commented 2 years ago

I tried something like this:

 python demo.py --resume weights/yolos_s_dWr.pth --data_file ../yolov7/images/COCO_val2014_000000001856.jpg --mid_pe_size 800 864 --init_pe_size 800 864
Not using distributed mode
Namespace(backbone_name='small_dWr', batch_size=2, bbox_loss_coef=5, clip_max_norm=0.1, coco_panoptic_path=None, coco_path=None, data_file='../yolo/images/COCO_val2014_000000001856.jpg', dataset_file='coco', decay_rate=0.1, det_token_num=100, device='cuda', dice_loss_coef=1, dist_url='env://', distributed=False, eos_coef=0.1, epochs=150, eval=False, eval_size=800, giou_loss_coef=2, init_pe_size=[800, 864], lr=0.0001, lr_backbone=1e-05, lr_drop=100, mid_pe_size=[800, 864], min_lr=1e-07, num_workers=2, output_dir='', pre_trained='', remove_difficult=False, resume='weights/yolos_s_dWr.pth', sched='warmupcos', seed=42, set_cost_bbox=5, set_cost_class=1, set_cost_giou=2, start_epoch=0, use_checkpoint=False, warmup_epochs=0, warmup_lr=1e-06, weight_decay=0.0001, world_size=1)

Got:

torch1.8/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1223, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for Detector:
    size mismatch for backbone.pos_embed: copying a param with shape torch.Size([1, 1829, 330]) from checkpoint, the shape in current model is torch.Size([1, 2801, 330]).
    size mismatch for backbone.mid_pos_embed: copying a param with shape torch.Size([13, 1, 1829, 330]) from checkpoint, the shape in current model is torch.Size([13, 1, 2801, 330]).

Yuxin-CV commented 2 years ago

The input sizes can be dynamic, we did't test demo.py, please directly try to train or inference YOLOS using the provided scripts.

lucasjinreal commented 2 years ago

@Yuxin-CV how? I occured above errors by specific different input size. demo.py is just copied from your coco_visualizexx.py, same args.

Yuxin-CV commented 2 years ago

You should set --init_pe_size 512 864 & --mid_pe_size 512 864, note that these two params are irrelevant with input size, they are with pre-trained YOLOS weight.

lucasjinreal commented 2 years ago

@Yuxin-CV if I set input to 800x800, what will happen? Does it means it's miss aligned with training?

Yuxin-CV commented 2 years ago

Of course it will be miss aligned with training, but the model can still process it, with degenerated accuracy.

Yuxin-CV commented 2 years ago

I don't konw why you would like to change the aspect ratio, but if you prefer H : W = 1 : 1 inputs, I suggest you re-train the model with scale jittering.

hustvl / YOLOS

Input size can not be dynamic? #12