Inference on images of variable sizes

jrukavina commented 6 months ago

Hi, you have a great repository, thanks for open sourcing it!

I am wondering if there is a way to run inference with your model on images of different dimensions? I have tried exporting to ONNX with dynamic width and height dimensions but that did't work. Also, when trying to inference with images of different sizes in pytorch I get something like the following error:

File "/.../RT-DETR/rtdetr_pytorch/tools/../src/zoo/rtdetr/hybrid_encoder.py", line 141, in with_pos_embed return tensor if pos_embed is None else tensor + pos_embed


RuntimeError: The size of tensor a (2500) must match the size of tensor b (1250) at non-singleton dimension 1

Is there a way to make your model support dynamic input dimensions or am I doing something wrong? Is it maybe a feature coming in RT-DETRv2?

lyuwenyu commented 6 months ago

try eval_spatial_size: [640, 640] -> eval_spatial_size: ~ to support dynamic input size

https://github.com/lyuwenyu/RT-DETR/blob/main/rtdetr_pytorch/configs/rtdetr/include/rtdetr_r50vd.yml#L43

jrukavina commented 6 months ago

Thanks, this worked! Although I also had to modify L58 to ~ Unfortunately, exporting to ONNX with dynamic width and height still does not work. Is there any way around this?

lyuwenyu commented 6 months ago

you can try to modify this logic to adapt your needs. ( move pos_embed init to forward

https://github.com/lyuwenyu/RT-DETR/blob/main/rtdetr_pytorch/src/zoo/rtdetr/hybrid_encoder.py#L255 https://github.com/lyuwenyu/RT-DETR/blob/main/rtdetr_pytorch/src/zoo/rtdetr/hybrid_encoder.py#L297

jrukavina commented 5 months ago

I will try that, thanks!

lyuwenyu / RT-DETR

Inference on images of variable sizes #237