timmeinhardt / trackformer

Implementation of "TrackFormer: Multi-Object Tracking with Transformers”. [Conference on Computer Vision and Pattern Recognition (CVPR), 2022]
https://arxiv.org/abs/2101.02702
Apache License 2.0
487 stars 113 forks source link

Why num_classes=20 in r50_deformable_detr_plus_iterative_bbox_refinement-checkpoint_hidden_dim_288.pth? #120

Closed imzhangyd closed 5 months ago

imzhangyd commented 6 months ago

I want to train on MOT17 dataset.

  1. I changed the num_classes = 1 in the 'build_model' function in 'src/trackformer/models/init.py'.

  2. The command:

    python src/train.py with \
    buv \
    deformable \
    multi_frame \
    tracking \
    output_dir=models/mot17_deformable_multi_frame
  3. Error:

    RuntimeError: Error(s) in loading state_dict for DeformableDETRTracking:
        size mismatch for class_embed.0.weight: copying a param with shape torch.Size([20, 288]) from checkpoint, the shape in current model is torch.Size([1, 288]).
        size mismatch for class_embed.0.bias: copying a param with shape torch.Size([20]) from checkpoint, the shape in current model is torch.Size([1]).
        size mismatch for class_embed.1.weight: copying a param with shape torch.Size([20, 288]) from checkpoint, the shape in current model is torch.Size([1, 288]).
        size mismatch for class_embed.1.bias: copying a param with shape torch.Size([20]) from checkpoint, the shape in current model is torch.Size([1]).
        size mismatch for class_embed.2.weight: copying a param with shape torch.Size([20, 288]) from checkpoint, the shape in current model is torch.Size([1, 288]).
        size mismatch for class_embed.2.bias: copying a param with shape torch.Size([20]) from checkpoint, the shape in current model is torch.Size([1]).
        size mismatch for class_embed.3.weight: copying a param with shape torch.Size([20, 288]) from checkpoint, the shape in current model is torch.Size([1, 288]).
        size mismatch for class_embed.3.bias: copying a param with shape torch.Size([20]) from checkpoint, the shape in current model is torch.Size([1]).
        size mismatch for class_embed.4.weight: copying a param with shape torch.Size([20, 288]) from checkpoint, the shape in current model is torch.Size([1, 288]).
        size mismatch for class_embed.4.bias: copying a param with shape torch.Size([20]) from checkpoint, the shape in current model is torch.Size([1]).
        size mismatch for class_embed.5.weight: copying a param with shape torch.Size([20, 288]) from checkpoint, the shape in current model is torch.Size([1, 288]).
        size mismatch for class_embed.5.bias: copying a param with shape torch.Size([20]) from checkpoint, the shape in current model is torch.Size([1]).

Questions:

Q1: I think the class number equals one in MOT17. Why the pretrained model 'r50_deformable_detr_plus_iterative_bbox_refinement-checkpoint_hidden_dim_288.pth' was trained with 20 class numbers?

Q2: When I set the class number = 20, it began training. Will this setting (class number=20 not 1) affect the performance?

Q3: If I want to train my private dataset, which only has one class. How to set the class number? If I set class number=1, the pretrained model can not be used.

Thank you very much!

timmeinhardt commented 6 months ago

We inherited the 20 classes setup from the original Deformable DETR code. The focal loss is more stable with more classes. You can train with any amount of output neurons. It will just learn to never predict the others and only the one class or background.

imzhangyd commented 6 months ago

Thanks.