lyuwenyu / RT-DETR

[CVPR 2024] Official RT-DETR (RTDETR paddle pytorch), Real-Time DEtection TRansformer, DETRs Beat YOLOs on Real-time Object Detection. 🔥 🔥 🔥
Apache License 2.0
2.21k stars 242 forks source link

Runtime error after 1 epoch train #147

Open muse1835 opened 9 months ago

muse1835 commented 9 months ago

hi,

torchrun --nproc_per_node=4 tools/train.py -c configs/rtdetr/rtdetr_r50vd_6x_coco.yml

when I try to train with multi-gpu command, Runtime error occured.

what is that mean?

Averaged stats: lr: 0.000010 loss: 23.7475 (25.5214) loss_bbox: 0.1442 (0.2272) loss_bbox_aux_0: 0.1618 (0.2567) loss_bbox_aux_1: 0.1476 (0.2418) loss_bbox_aux_2: 0.1450 (0.2356) loss_bbox_aux_3: 0.1391 (0.2316) loss_bbox_aux_4: 0.1415 (0.2289) loss_bbox_aux_5: 0.2002 (0.2997) loss_bbox_dn_0: 0.2185 (0.2814) loss_bbox_dn_1: 0.1842 (0.2603) loss_bbox_dn_2: 0.1791 (0.2534) loss_bbox_dn_3: 0.1734 (0.2498) loss_bbox_dn_4: 0.1729 (0.2483) loss_bbox_dn_5: 0.1725 (0.2479) loss_giou: 1.1371 (1.2663) loss_giou_aux_0: 1.1925 (1.3149) loss_giou_aux_1: 1.1698 (1.2892) loss_giou_aux_2: 1.1695 (1.2800) loss_giou_aux_3: 1.1505 (1.2721) loss_giou_aux_4: 1.1565 (1.2692) loss_giou_aux_5: 1.2983 (1.3946) loss_giou_dn_0: 1.1583 (1.2238) loss_giou_dn_1: 1.0956 (1.1898) loss_giou_dn_2: 1.0880 (1.1799) loss_giou_dn_3: 1.0817 (1.1761) loss_giou_dn_4: 1.0842 (1.1751) loss_giou_dn_5: 1.0835 (1.1760) loss_vfl: 0.6192 (0.5747) loss_vfl_aux_0: 0.5701 (0.5152) loss_vfl_aux_1: 0.5826 (0.5378) loss_vfl_aux_2: 0.6320 (0.5474) loss_vfl_aux_3: 0.6379 (0.5579) loss_vfl_aux_4: 0.6331 (0.5665) loss_vfl_aux_5: 0.5443 (0.4792) loss_vfl_dn_0: 0.3810 (0.3727) loss_vfl_dn_1: 0.3856 (0.3772) loss_vfl_dn_2: 0.3853 (0.3782) loss_vfl_dn_3: 0.3847 (0.3803) loss_vfl_dn_4: 0.3865 (0.3815) loss_vfl_dn_5: 0.3835 (0.3832) Traceback (most recent call last): File "tools/train.py", line 47, in main(args) File "tools/train.py", line 33, in main solver.fit() File "/home/wr/Workspace/WS/RT-DETR/rtdetr_pytorch/tools/../src/solver/det_solver.py", line 52, in fit test_stats, coco_evaluator = evaluate( File "/home/wr/.local/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, kwargs) File "/home/wr/Workspace/WS/RT-DETR/rtdetr_pytorch/tools/../src/solver/det_engine.py", line 121, in evaluate outputs = model(samples) File "/home/wr/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "/home/wr/Workspace/WS/RT-DETR/rtdetr_pytorch/tools/../src/zoo/rtdetr/rtdetr.py", line 34, in forward x = self.encoder(x)
File "/home/wr/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(
args,
kwargs) File "/home/wr/Workspace/WS/RT-DETR/rtdetr_pytorch/tools/../src/zoo/rtdetr/hybrid_encoder.py", line 299, in forward memory = self.encoder[i](src_flatten, pos_embed=pos_embed) File "/home/wr/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "/home/wr/Workspace/WS/RT-DETR/rtdetr_pytorch/tools/../src/zoo/rtdetr/hybrid_encoder.py", line 174, in forward output = layer(output, src_mask=src_mask, pos_embed=pos_embed) File "/home/wr/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, **kwargs) File "/home/wr/Workspace/WS/RT-DETR/rtdetr_pytorch/tools/../src/zoo/rtdetr/hybrid_encoder.py", line 147, in forward q = k = self.with_pos_embed(src, pos_embed) File "/home/wr/Workspace/WS/RT-DETR/rtdetr_pytorch/tools/../src/zoo/rtdetr/hybrid_encoder.py", line 141, in with_pos_embed return tensor if pos_embed is None else tensor + pos_embed RuntimeError: The size of tensor a (920) must match the size of tensor b (400) at non-singleton dimension 1

lyuwenyu commented 9 months ago

Make sure eval_spatial_size of HybridEncoder is same as resize of val_dataloader

https://github.com/lyuwenyu/RT-DETR/blob/main/rtdetr_pytorch/configs/rtdetr/include/dataloader.yml#L33 https://github.com/lyuwenyu/RT-DETR/blob/main/rtdetr_pytorch/configs/rtdetr/include/rtdetr_r50vd.yml#L58

muse1835 commented 9 months ago

I changed the eval_spatial_size to [1280, 720]. but not work...

/////// ////////////////////////////////////// It works with [1280, 736]. !! does the model only works with some fixed image sizes in multi_scale? (multi_scale: [480, 512, 544, 576, 608, 640, 640, 640, 672, 704, 736, 768, 800]) I want to estimate objects in 2880x416 pixel(WxH) images , how do I do?

lyuwenyu commented 9 months ago

size must be divisible by 32

muse1835 commented 9 months ago

size must be divisible by 32

Thanks!!!

muse1835 commented 9 months ago

how about eval_spatial_size of RTDETRTransformer??? It should be change???