lyuwenyu / RT-DETR

[CVPR 2024] Official RT-DETR (RTDETR paddle pytorch), Real-Time DEtection TRansformer, DETRs Beat YOLOs on Real-time Object Detection. 🔥 🔥 🔥
Apache License 2.0
2.61k stars 303 forks source link

Image resolution and model export #267

Closed PaulineTreyvaud closed 6 months ago

PaulineTreyvaud commented 7 months ago

Hi ! Congrats and thank your for your work !

I would like to train on full-HD images (1920 x 1080 pixels {however some have different formats}) and test on images with the same size (custom dataset). I changed the resize parameter of the dataloader.yml for both train and validating to 1056x1056, as well as the eval_spatial_size parameter (both l43 and l58) of rtdetr_r50vd.yml to keep maximum resolution while avoiding padding. This doesn't seem to be a problem for the training part, however the onnx export creates an error :

======================================================================= 1488, in _slow_forward result = self.forward(*input, **kwargs) File "/rt-detr/tools/../src/zoo/rtdetr/hybrid_encoder.py", line 147, in forward q = k = self.with_pos_embed(src, pos_embed) File "./rt-detr/tools/../src/zoo/rtdetr/hybrid_encoder.py", line 141, in with_pos_embed return tensor if pos_embed is None else tensor + pos_embed

RuntimeError : the size if tensor a (400) must match the size of tensor b (1089) at non-singleton dimension 1

=======================================================================

Am I missing something ?

For context : I use the pytorch version, and a config file based on rtdetr_r101vd_6x_coco.yml

Thank you!

lyuwenyu commented 7 months ago

try to set l58 as ~ ( None

PaulineTreyvaud commented 7 months ago

@lyuwenyu I did, the outcome/error is the same :)

raitzjm commented 6 months ago

@PaulineTreyvaud try also changing Line 43 to ~

raitzjm commented 5 months ago

@PaulineTreyvaud did that work for you ? Because I did the same, but I am still having issues when trying to do inference with the ONNX runtime.

PaulineTreyvaud commented 5 months ago

@raitzjm what ended up working is to change both l43 and l58 of rtdetr_r50vd.yml, as well as l53 and l54 of export_onnx.py to the expected resolution ((1056, 1056) in my case). Also, note that the exported model can re-calibrate the output to the original image size if it is given as a parameter (change orig_target_sizes during inference).