megvii-research / MOTR

[ECCV2022] MOTR: End-to-End Multiple-Object Tracking with TRansformer
Other
614 stars 93 forks source link

Errors when trying to train the model #43

Open Wincioor11 opened 2 years ago

Wincioor11 commented 2 years ago

Hi, I downloaded the datasets and organized them as the instruction says. I have several issues when trying to reproduce your instruction steps:

  1. Your pretrained model gives only 28% MOTA when running configs/r50_motr_eval.sh on it.
  2. I tried training the model on my own using your instructions, but errors occurred. I downloaded pretrained DETR from https://github.com/fundamentalvision/Deformable-DETR#main-results. When I run the configs/r50_motr_train.sh I get the errors about the wrong pretrained model data sizes.
    
    /MOTR$ sh configs/r50_motr_train.sh
    *****************************************
    Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your app
    *****************************************
    | distributed init (rank 2): env://
    | distributed init (rank 1): env://
    | distributed init (rank 0): env://
    | distributed init (rank 3): env://
    git:
    sha: 8690da3392159635ca37c31975126acf40220724, status: has uncommited changes, branch: main

Namespace(accurate_ratio=False, aux_loss=True, backbone='resnet50', batch_size=1, bbox_loss_coef=5, cache_mode=False, cj=False, clip_max_norm=0.1, cls_loss_coef=2, coco_panoptic_path=None, oco/', crop=False, data_txt_path_train='./datasets/data_path/joint.train', data_txt_path_val='./datasets/data_path/mot17.train', dataset_file='e2e_joint', dec_layers=6, dec_n_points=4, decooef=1, dilation=False, dim_feedforward=1024, dist_backend='nccl', dist_url='env://', distributed=True, dropout=0.0, enable_fpn=False, enc_layers=6, enc_n_points=4, epochs=200, eval=False, eignore=False, focal_alpha=0.25, fp_ratio=0.3, frozen_weights=None, giou_loss_coef=2, gpu=0, gt_file_train=None, gt_file_val=None, hidden_dim=256, img_path='data/valid/JPEGImages/', input_vi.0002, lr_backbone=2e-05, lr_backbone_names=['backbone.0'], lr_drop=100, lr_drop_epochs=None, lr_linear_proj_mult=0.1, lr_linear_proj_names=['reference_points', 'sampling_offsets'], mask_lonk_len=4, memory_bank_score_thresh=0.0, memory_bank_type=None, memory_bank_with_self_attn=False, merger_dropout=0.0, meta_arch='motr', mix_match=False, mot_path='./data', nheads=8, num_anchm_workers=2, output_dir='exps/e2e_motr_r50_joint', position_embedding='sine', position_embedding_scale=6.283185307179586, pretrained='coco_model_final.pth', query_interaction_layer='QIM', rresume='', sample_interval=10, sample_mode='random_interval', sampler_lengths=[2, 3, 4, 5], sampler_steps=[50, 90, 150], save_period=50, seed=42, set_cost_bbox=5, set_cost_class=2, set_costoch=0, two_stage=False, update_query_pos=True, use_checkpoint=True, val_width=800, vis=False, weight_decay=0.0001, with_box_refine=True, world_size=4) Training with Extra Self Attention in Every Decoder. Training with Self-Cross Attention. number of params: 43912992 register 1-th video: data/crowdhuman/labels_with_ids/val register 2-th video: data/MOT17/labels_with_ids/train/MOT17-02-SDP/img1 register 3-th video: data/MOT17/labels_with_ids/train/MOT17-04-SDP/img1 register 4-th video: data/MOT17/labels_with_ids/train/MOT17-05-SDP/img1 register 5-th video: data/MOT17/labels_with_ids/train/MOT17-09-SDP/img1 register 6-th video: data/MOT17/labels_with_ids/train/MOT17-10-SDP/img1 register 7-th video: data/MOT17/labels_with_ids/train/MOT17-11-SDP/img1 register 8-th video: data/MOT17/labels_with_ids/train/MOT17-13-SDP/img1 sampler_steps=[50, 90, 150] lenghts=[2, 3, 4, 5] register 1-th video: data/MOT17/labels_with_ids/train/MOT17-02-SDP/img1 register 2-th video: data/MOT17/labels_with_ids/train/MOT17-04-SDP/img1 register 3-th video: data/MOT17/labels_with_ids/train/MOT17-05-SDP/img1 register 4-th video: data/MOT17/labels_with_ids/train/MOT17-09-SDP/img1 register 5-th video: data/MOT17/labels_with_ids/train/MOT17-10-SDP/img1 register 6-th video: data/MOT17/labels_with_ids/train/MOT17-11-SDP/img1 register 7-th video: data/MOT17/labels_with_ids/train/MOT17-13-SDP/img1 sampler_steps=[50, 90, 150] lenghts=[2, 3, 4, 5] loaded coco_model_final.pth Skip loading parameter class_embed.0.weight, required shapetorch.Size([1, 256]), loaded shapetorch.Size([91, 256]). If you see this, your model does not fully load the pre-trained weight. Ps for your own dataset. load class_embed: class_embed.0.weight shape=torch.Size([91, 256]) Skip loading parameter class_embed.0.bias, required shapetorch.Size([1]), loaded shapetorch.Size([91]). If you see this, your model does not fully load the pre-trained weight. Please make swn dataset. load class_embed: class_embed.0.bias shape=torch.Size([91]) Skip loading parameter class_embed.1.weight, required shapetorch.Size([1, 256]), loaded shapetorch.Size([91, 256]). If you see this, your model does not fully load the pre-trained weight. Ps for your own dataset. load class_embed: class_embed.1.weight shape=torch.Size([91, 256]) Skip loading parameter class_embed.1.bias, required shapetorch.Size([1]), loaded shapetorch.Size([91]). If you see this, your model does not fully load the pre-trained weight. Please make swn dataset. load class_embed: class_embed.1.bias shape=torch.Size([91]) Skip loading parameter class_embed.2.weight, required shapetorch.Size([1, 256]), loaded shapetorch.Size([91, 256]). If you see this, your model does not fully load the pre-trained weight. Ps for your own dataset. load class_embed: class_embed.2.weight shape=torch.Size([91, 256]) Skip loading parameter class_embed.2.bias, required shapetorch.Size([1]), loaded shapetorch.Size([91]). If you see this, your model does not fully load the pre-trained weight. Please make swn dataset. load class_embed: class_embed.2.bias shape=torch.Size([91]) Skip loading parameter class_embed.3.weight, required shapetorch.Size([1, 256]), loaded shapetorch.Size([91, 256]). If you see this, your model does not fully load the pre-trained weight. Ps for your own dataset. load class_embed: class_embed.3.weight shape=torch.Size([91, 256]) Skip loading parameter class_embed.3.bias, required shapetorch.Size([1]), loaded shapetorch.Size([91]). If you see this, your model does not fully load the pre-trained weight. Please make swn dataset. load class_embed: class_embed.3.bias shape=torch.Size([91]) Skip loading parameter class_embed.4.weight, required shapetorch.Size([1, 256]), loaded shapetorch.Size([91, 256]). If you see this, your model does not fully load the pre-trained weight. Ps for your own dataset. load class_embed: class_embed.4.weight shape=torch.Size([91, 256]) Skip loading parameter class_embed.4.bias, required shapetorch.Size([1]), loaded shapetorch.Size([91]). If you see this, your model does not fully load the pre-trained weight. Please make swn dataset. load class_embed: class_embed.4.bias shape=torch.Size([91]) Skip loading parameter class_embed.5.weight, required shapetorch.Size([1, 256]), loaded shapetorch.Size([91, 256]). If you see this, your model does not fully load the pre-trained weight. Ps for your own dataset. load class_embed: class_embed.5.weight shape=torch.Size([91, 256]) Skip loading parameter class_embed.5.bias, required shapetorch.Size([1]), loaded shapetorch.Size([91]). If you see this, your model does not fully load the pre-trained weight. Please make swn dataset. load class_embed: class_embed.5.bias shape=torch.Size([91]) No param track_embed.self_attn.in_proj_weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your own dataset No param track_embed.self_attn.in_proj_bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your own dataset. No param track_embed.self_attn.out_proj.weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your own datase No param track_embed.self_attn.out_proj.bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your own dataset. No param track_embed.linear1.weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your own dataset. No param track_embed.linear1.bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your own dataset. No param track_embed.linear2.weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your own dataset. No param track_embed.linear2.bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your own dataset. No param track_embed.linear_pos1.weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your own dataset. No param track_embed.linear_pos1.bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your own dataset. No param track_embed.linear_pos2.weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your own dataset. No param track_embed.linear_pos2.bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your own dataset. No param track_embed.norm_pos.weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your own dataset. No param track_embed.norm_pos.bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your own dataset. No param track_embed.linear_feat1.weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your own dataset. No param track_embed.linear_feat1.bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your own dataset. No param track_embed.linear_feat2.weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your own dataset. No param track_embed.linear_feat2.bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your own dataset. No param track_embed.norm_feat.weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your own dataset. No param track_embed.norm_feat.bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your own dataset. No param track_embed.norm1.weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your own dataset. No param track_embed.norm1.bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your own dataset. No param track_embed.norm2.weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your own dataset. No param track_embed.norm2.bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your own dataset. No param track_embed.norm3.weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your own dataset. No param track_embed.norm3.bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your own dataset. No param transformer.decoder.layers.0.update_attn.in_proj_weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes f No param transformer.decoder.layers.0.update_attn.in_proj_bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for No param transformer.decoder.layers.0.update_attn.out_proj.weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes No param transformer.decoder.layers.0.update_attn.out_proj.bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes fo No param transformer.decoder.layers.0.norm4.weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your own da No param transformer.decoder.layers.0.norm4.bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your own data No param transformer.decoder.layers.1.update_attn.in_proj_weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes f No param transformer.decoder.layers.1.update_attn.in_proj_bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for No param transformer.decoder.layers.1.update_attn.out_proj.weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes No param transformer.decoder.layers.1.update_attn.out_proj.bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes fo No param transformer.decoder.layers.1.norm4.weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your own da No param transformer.decoder.layers.1.norm4.bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your own data No param transformer.decoder.layers.2.update_attn.in_proj_weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes f No param transformer.decoder.layers.2.update_attn.in_proj_bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for No param transformer.decoder.layers.2.update_attn.out_proj.weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes No param transformer.decoder.layers.2.update_attn.out_proj.bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes fo No param transformer.decoder.layers.2.norm4.weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your own da No param transformer.decoder.layers.2.norm4.bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your own data No param transformer.decoder.layers.3.update_attn.in_proj_weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes f No param transformer.decoder.layers.3.update_attn.in_proj_bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for No param transformer.decoder.layers.3.update_attn.out_proj.weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes No param transformer.decoder.layers.3.update_attn.out_proj.bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes fo No param transformer.decoder.layers.3.norm4.weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your own da No param transformer.decoder.layers.3.norm4.bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your own data No param transformer.decoder.layers.4.update_attn.in_proj_weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes f No param transformer.decoder.layers.4.update_attn.in_proj_bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for No param transformer.decoder.layers.4.update_attn.out_proj.weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes No param transformer.decoder.layers.4.update_attn.out_proj.bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes fo No param transformer.decoder.layers.4.norm4.weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your own da No param transformer.decoder.layers.4.norm4.bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your own data No param transformer.decoder.layers.5.update_attn.in_proj_weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes f No param transformer.decoder.layers.5.update_attn.in_proj_bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for No param transformer.decoder.layers.5.update_attn.out_proj.weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes No param transformer.decoder.layers.5.update_attn.out_proj.bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes fo No param transformer.decoder.layers.5.norm4.weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your own da No param transformer.decoder.layers.5.norm4.bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your own data No param transformer.decoder.bbox_embed.0.layers.0.weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your No param transformer.decoder.bbox_embed.0.layers.0.bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your o No param transformer.decoder.bbox_embed.0.layers.1.weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your No param transformer.decoder.bbox_embed.0.layers.1.bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your o No param transformer.decoder.bbox_embed.0.layers.2.weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your No param transformer.decoder.bbox_embed.0.layers.2.bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your o No param transformer.decoder.bbox_embed.1.layers.0.weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your No param transformer.decoder.bbox_embed.1.layers.0.bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your o No param transformer.decoder.bbox_embed.1.layers.1.weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your No param transformer.decoder.bbox_embed.1.layers.1.bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your o No param transformer.decoder.bbox_embed.1.layers.2.weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your No param transformer.decoder.bbox_embed.1.layers.2.bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your o No param transformer.decoder.bbox_embed.2.layers.0.weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your No param transformer.decoder.bbox_embed.2.layers.0.bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your o No param transformer.decoder.bbox_embed.2.layers.1.weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your No param transformer.decoder.bbox_embed.2.layers.1.bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your o No param transformer.decoder.bbox_embed.2.layers.2.weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your No param transformer.decoder.bbox_embed.2.layers.2.bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your o No param transformer.decoder.bbox_embed.3.layers.0.weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your No param transformer.decoder.bbox_embed.3.layers.0.bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your o No param transformer.decoder.bbox_embed.3.layers.1.weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your No param transformer.decoder.bbox_embed.3.layers.1.bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your o No param transformer.decoder.bbox_embed.3.layers.2.weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your No param transformer.decoder.bbox_embed.3.layers.2.bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your o No param transformer.decoder.bbox_embed.4.layers.0.weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your No param transformer.decoder.bbox_embed.4.layers.0.bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your o No param transformer.decoder.bbox_embed.4.layers.1.weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your No param transformer.decoder.bbox_embed.4.layers.1.bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your o No param transformer.decoder.bbox_embed.4.layers.2.weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your No param transformer.decoder.bbox_embed.4.layers.2.bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your o No param transformer.decoder.bbox_embed.5.layers.0.weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your No param transformer.decoder.bbox_embed.5.layers.0.bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your o No param transformer.decoder.bbox_embed.5.layers.1.weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your No param transformer.decoder.bbox_embed.5.layers.1.bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your o No param transformer.decoder.bbox_embed.5.layers.2.weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your No param transformer.decoder.bbox_embed.5.layers.2.bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your o Start training



I tried all pretrained DETR models provided but none of them worked. Can you help me? 

3. I deleted the line from train.sh that points to pretrained DETR model and tried training from scratch.  The training stopped after the 3rd epoch and I don't see any log that tells the reason for the stopped training. 
zyayoung commented 2 years ago
  1. The pretrained model is for testing on the MOT17 test split.
  2. We use pre-trained Deformable DETR + iterative bounding box refinement from Deformable-DETR. You may try this weight. Skip loading class_embed is expected since the number of classes is changed.
  3. We haven't gone into the issue of stopping without any error log yet if no process was killed due to out of memory, you may try again and see if this issue still occurs.
Wincioor11 commented 2 years ago

Thanks for the quick answer :) Ok, so now I understand the 2), for 3) it was just a killed process probably. Now I started the training on 2 GPUs and after 4 days I cannot see any new checkpoints (besides the 1st one) or results. Do you have a better way to do a full training with processes in the background and track the progress?

I don't understand the 1), what does it mean that model is for testing MOT17 test split? I used the r50_motr_eval.sh script, how can I evaluate it properly ?