goutamyg / SMAT

[WACV 2024] Separable Self and Mixed Attention Transformers for Efficient Object Tracking
Apache License 2.0
30 stars 5 forks source link

Train on Custom Dataset #12

Open setarekhosravi opened 3 months ago

setarekhosravi commented 3 months ago

Hello, Thank you for your great work. I wanted to train SMAT on my dataset, can I convert my dataset format into LASOT dataset format, and train SMAT on it? I don't want to change your official code, so I want to change my dataset format. please answer me to solve my problem. Thank you again.

setarekhosravi commented 3 months ago

When I want to train it on lasot sample (just drone videos) I see the output below:

python tracking/train.py --script mobilevitv2_track --config mobilevitv2_256_128x1_ep300 --save_dir ./output --mode single
script_name: mobilevitv2_track.py  config_name: mobilevitv2_256_128x1_ep300.yaml
New configuration is shown below.
MODEL configuration: {'PRETRAIN_FILE': 'mobilevitv2-1.0.pt', 'EXTRA_MERGER': False, 'RETURN_INTER': False, 'RETURN_STAGES': [], 'BACKBONE': {'TYPE': 'mobilevitv2-1.0', 'STRIDE': 16, 'MID_PE': False, 'SEP_SEG': False, 'CAT_MODE': 'direct', 'MERGE_LAYER': 0, 'ADD_CLS_TOKEN': False, 'CLS_TOKEN_USE_MODE': 'ignore', 'MIXED_ATTN': True}, 'NECK': {'TYPE': 'BN_PWXCORR', 'NUM_CHANNS_POST_XCORR': 64}, 'HEAD': {'TYPE': 'CENTER_SSAT', 'NUM_CHANNELS': 128}}

TRAIN configuration: {'LR': 0.0004, 'WEIGHT_DECAY': 0.0001, 'EPOCH': 300, 'LR_DROP_EPOCH': 240, 'BATCH_SIZE': 128, 'NUM_WORKER': 5, 'OPTIMIZER': 'ADAMW', 'BACKBONE_MULTIPLIER': 0.1, 'GIOU_WEIGHT': 2.0, 'L1_WEIGHT': 5.0, 'FREEZE_LAYERS': [0], 'PRINT_INTERVAL': 50, 'VAL_EPOCH_INTERVAL': 1, 'GRAD_CLIP_NORM': 0.1, 'AMP': False, 'SCHEDULER': {'TYPE': 'step', 'DECAY_RATE': 0.1}}

DATA configuration: {'SAMPLER_MODE': 'causal', 'MEAN': [0.0, 0.0, 0.0], 'STD': [1.0, 1.0, 1.0], 'MAX_SAMPLE_INTERVAL': 200, 'TRAIN': {'DATASETS_NAME': ['LASOT', 'GOT10K_train_full', 'COCO17', 'TRACKINGNET'], 'DATASETS_RATIO': [1, 1, 1, 1], 'SAMPLE_PER_EPOCH': 60000}, 'VAL': {'DATASETS_NAME': ['GOT10K_official_val'], 'DATASETS_RATIO': [1], 'SAMPLE_PER_EPOCH': 10000}, 'SEARCH': {'SIZE': 256, 'FACTOR': 4.0, 'CENTER_JITTER': 3, 'SCALE_JITTER': 0.25, 'NUMBER': 1}, 'TEMPLATE': {'NUMBER': 1, 'SIZE': 128, 'FACTOR': 2.0, 'CENTER_JITTER': 0, 'SCALE_JITTER': 0}}

TEST configuration: {'DEVICE': 'cuda', 'TEMPLATE_FACTOR': 2.0, 'TEMPLATE_SIZE': 128, 'SEARCH_FACTOR': 4.0, 'SEARCH_SIZE': 256, 'EPOCH': 300}

sampler_mode causal
Load pretrained model from: /home/setare/Vision/Work/Tracking/Single Object Tracking/Single Object Trackers/SMAT_FinProj/lib/models/mobilevit_track/../../../pretrained_models/mobilevitv2-1.0.pt
Learnable parameters are shown below.
backbone.conv_1.block.conv.weight
backbone.conv_1.block.norm.weight
backbone.conv_1.block.norm.bias
backbone.layer_1.0.block.exp_1x1.block.conv.weight
backbone.layer_1.0.block.exp_1x1.block.norm.weight
backbone.layer_1.0.block.exp_1x1.block.norm.bias
backbone.layer_1.0.block.conv_3x3.block.conv.weight
backbone.layer_1.0.block.conv_3x3.block.norm.weight
backbone.layer_1.0.block.conv_3x3.block.norm.bias
backbone.layer_1.0.block.red_1x1.block.conv.weight
backbone.layer_1.0.block.red_1x1.block.norm.weight
backbone.layer_1.0.block.red_1x1.block.norm.bias
backbone.layer_2.0.block.exp_1x1.block.conv.weight
backbone.layer_2.0.block.exp_1x1.block.norm.weight
backbone.layer_2.0.block.exp_1x1.block.norm.bias
backbone.layer_2.0.block.conv_3x3.block.conv.weight
backbone.layer_2.0.block.conv_3x3.block.norm.weight
backbone.layer_2.0.block.conv_3x3.block.norm.bias
backbone.layer_2.0.block.red_1x1.block.conv.weight
backbone.layer_2.0.block.red_1x1.block.norm.weight
backbone.layer_2.0.block.red_1x1.block.norm.bias
backbone.layer_2.1.block.exp_1x1.block.conv.weight
backbone.layer_2.1.block.exp_1x1.block.norm.weight
backbone.layer_2.1.block.exp_1x1.block.norm.bias
backbone.layer_2.1.block.conv_3x3.block.conv.weight
backbone.layer_2.1.block.conv_3x3.block.norm.weight
backbone.layer_2.1.block.conv_3x3.block.norm.bias
backbone.layer_2.1.block.red_1x1.block.conv.weight
backbone.layer_2.1.block.red_1x1.block.norm.weight
backbone.layer_2.1.block.red_1x1.block.norm.bias
backbone.layer_3.0.block.exp_1x1.block.conv.weight
backbone.layer_3.0.block.exp_1x1.block.norm.weight
backbone.layer_3.0.block.exp_1x1.block.norm.bias
backbone.layer_3.0.block.conv_3x3.block.conv.weight
backbone.layer_3.0.block.conv_3x3.block.norm.weight
backbone.layer_3.0.block.conv_3x3.block.norm.bias
backbone.layer_3.0.block.red_1x1.block.conv.weight
backbone.layer_3.0.block.red_1x1.block.norm.weight
backbone.layer_3.0.block.red_1x1.block.norm.bias
backbone.layer_3.1.local_rep.0.block.conv.weight
backbone.layer_3.1.local_rep.0.block.norm.weight
backbone.layer_3.1.local_rep.0.block.norm.bias
backbone.layer_3.1.local_rep.1.block.conv.weight
backbone.layer_3.1.global_rep.0.pre_norm_attn.0.weight
backbone.layer_3.1.global_rep.0.pre_norm_attn.0.bias
backbone.layer_3.1.global_rep.0.pre_norm_attn.1.qkv_proj.block.conv.weight
backbone.layer_3.1.global_rep.0.pre_norm_attn.1.qkv_proj.block.conv.bias
backbone.layer_3.1.global_rep.0.pre_norm_attn.1.out_proj.block.conv.weight
backbone.layer_3.1.global_rep.0.pre_norm_attn.1.out_proj.block.conv.bias
backbone.layer_3.1.global_rep.0.pre_norm_ffn.0.weight
backbone.layer_3.1.global_rep.0.pre_norm_ffn.0.bias
backbone.layer_3.1.global_rep.0.pre_norm_ffn.1.block.conv.weight
backbone.layer_3.1.global_rep.0.pre_norm_ffn.1.block.conv.bias
backbone.layer_3.1.global_rep.0.pre_norm_ffn.3.block.conv.weight
backbone.layer_3.1.global_rep.0.pre_norm_ffn.3.block.conv.bias
backbone.layer_3.1.global_rep.1.pre_norm_attn.0.weight
backbone.layer_3.1.global_rep.1.pre_norm_attn.0.bias
backbone.layer_3.1.global_rep.1.pre_norm_attn.1.qkv_proj.block.conv.weight
backbone.layer_3.1.global_rep.1.pre_norm_attn.1.qkv_proj.block.conv.bias
backbone.layer_3.1.global_rep.1.pre_norm_attn.1.out_proj.block.conv.weight
backbone.layer_3.1.global_rep.1.pre_norm_attn.1.out_proj.block.conv.bias
backbone.layer_3.1.global_rep.1.pre_norm_ffn.0.weight
backbone.layer_3.1.global_rep.1.pre_norm_ffn.0.bias
backbone.layer_3.1.global_rep.1.pre_norm_ffn.1.block.conv.weight
backbone.layer_3.1.global_rep.1.pre_norm_ffn.1.block.conv.bias
backbone.layer_3.1.global_rep.1.pre_norm_ffn.3.block.conv.weight
backbone.layer_3.1.global_rep.1.pre_norm_ffn.3.block.conv.bias
backbone.layer_3.1.global_rep.2.weight
backbone.layer_3.1.global_rep.2.bias
backbone.layer_3.1.conv_proj.block.conv.weight
backbone.layer_3.1.conv_proj.block.norm.weight
backbone.layer_3.1.conv_proj.block.norm.bias
backbone.layer_4.0.block.exp_1x1.block.conv.weight
backbone.layer_4.0.block.exp_1x1.block.norm.weight
backbone.layer_4.0.block.exp_1x1.block.norm.bias
backbone.layer_4.0.block.conv_3x3.block.conv.weight
backbone.layer_4.0.block.conv_3x3.block.norm.weight
backbone.layer_4.0.block.conv_3x3.block.norm.bias
backbone.layer_4.0.block.red_1x1.block.conv.weight
backbone.layer_4.0.block.red_1x1.block.norm.weight
backbone.layer_4.0.block.red_1x1.block.norm.bias
backbone.layer_4.1.local_rep.0.block.conv.weight
backbone.layer_4.1.local_rep.0.block.norm.weight
backbone.layer_4.1.local_rep.0.block.norm.bias
backbone.layer_4.1.local_rep.1.block.conv.weight
backbone.layer_4.1.global_rep.0.pre_norm_attn.0.weight
backbone.layer_4.1.global_rep.0.pre_norm_attn.0.bias
backbone.layer_4.1.global_rep.0.pre_norm_attn.1.qkv_proj.block.conv.weight
backbone.layer_4.1.global_rep.0.pre_norm_attn.1.qkv_proj.block.conv.bias
backbone.layer_4.1.global_rep.0.pre_norm_attn.1.out_proj.block.conv.weight
backbone.layer_4.1.global_rep.0.pre_norm_attn.1.out_proj.block.conv.bias
backbone.layer_4.1.global_rep.0.pre_norm_ffn.0.weight
backbone.layer_4.1.global_rep.0.pre_norm_ffn.0.bias
backbone.layer_4.1.global_rep.0.pre_norm_ffn.1.block.conv.weight
backbone.layer_4.1.global_rep.0.pre_norm_ffn.1.block.conv.bias
backbone.layer_4.1.global_rep.0.pre_norm_ffn.3.block.conv.weight
backbone.layer_4.1.global_rep.0.pre_norm_ffn.3.block.conv.bias
backbone.layer_4.1.global_rep.1.pre_norm_attn.0.weight
backbone.layer_4.1.global_rep.1.pre_norm_attn.0.bias
backbone.layer_4.1.global_rep.1.pre_norm_attn.1.qkv_proj.block.conv.weight
backbone.layer_4.1.global_rep.1.pre_norm_attn.1.qkv_proj.block.conv.bias
backbone.layer_4.1.global_rep.1.pre_norm_attn.1.out_proj.block.conv.weight
backbone.layer_4.1.global_rep.1.pre_norm_attn.1.out_proj.block.conv.bias
backbone.layer_4.1.global_rep.1.pre_norm_ffn.0.weight
backbone.layer_4.1.global_rep.1.pre_norm_ffn.0.bias
backbone.layer_4.1.global_rep.1.pre_norm_ffn.1.block.conv.weight
backbone.layer_4.1.global_rep.1.pre_norm_ffn.1.block.conv.bias
backbone.layer_4.1.global_rep.1.pre_norm_ffn.3.block.conv.weight
backbone.layer_4.1.global_rep.1.pre_norm_ffn.3.block.conv.bias
backbone.layer_4.1.global_rep.2.pre_norm_attn.0.weight
backbone.layer_4.1.global_rep.2.pre_norm_attn.0.bias
backbone.layer_4.1.global_rep.2.pre_norm_attn.1.qkv_proj.block.conv.weight
backbone.layer_4.1.global_rep.2.pre_norm_attn.1.qkv_proj.block.conv.bias
backbone.layer_4.1.global_rep.2.pre_norm_attn.1.out_proj.block.conv.weight
backbone.layer_4.1.global_rep.2.pre_norm_attn.1.out_proj.block.conv.bias
backbone.layer_4.1.global_rep.2.pre_norm_ffn.0.weight
backbone.layer_4.1.global_rep.2.pre_norm_ffn.0.bias
backbone.layer_4.1.global_rep.2.pre_norm_ffn.1.block.conv.weight
backbone.layer_4.1.global_rep.2.pre_norm_ffn.1.block.conv.bias
backbone.layer_4.1.global_rep.2.pre_norm_ffn.3.block.conv.weight
backbone.layer_4.1.global_rep.2.pre_norm_ffn.3.block.conv.bias
backbone.layer_4.1.global_rep.3.pre_norm_attn.0.weight
backbone.layer_4.1.global_rep.3.pre_norm_attn.0.bias
backbone.layer_4.1.global_rep.3.pre_norm_attn.1.qkv_proj.block.conv.weight
backbone.layer_4.1.global_rep.3.pre_norm_attn.1.qkv_proj.block.conv.bias
backbone.layer_4.1.global_rep.3.pre_norm_attn.1.out_proj.block.conv.weight
backbone.layer_4.1.global_rep.3.pre_norm_attn.1.out_proj.block.conv.bias
backbone.layer_4.1.global_rep.3.pre_norm_ffn.0.weight
backbone.layer_4.1.global_rep.3.pre_norm_ffn.0.bias
backbone.layer_4.1.global_rep.3.pre_norm_ffn.1.block.conv.weight
backbone.layer_4.1.global_rep.3.pre_norm_ffn.1.block.conv.bias
backbone.layer_4.1.global_rep.3.pre_norm_ffn.3.block.conv.weight
backbone.layer_4.1.global_rep.3.pre_norm_ffn.3.block.conv.bias
backbone.layer_4.1.global_rep.4.weight
backbone.layer_4.1.global_rep.4.bias
backbone.layer_4.1.conv_proj.block.conv.weight
backbone.layer_4.1.conv_proj.block.norm.weight
backbone.layer_4.1.conv_proj.block.norm.bias
neck.BN_x.weight
neck.BN_x.bias
neck.BN_z.weight
neck.BN_z.bias
feature_fusor.pw_corr.CA_layer.fc1.weight
feature_fusor.pw_corr.CA_layer.fc1.bias
feature_fusor.pw_corr.CA_layer.fc2.weight
feature_fusor.pw_corr.CA_layer.fc2.bias
feature_fusor.adj_layer.weight
feature_fusor.adj_layer.bias
box_head.pre_ssat_cls.block.conv.weight
box_head.pre_ssat_cls.block.norm.weight
box_head.pre_ssat_cls.block.norm.bias
box_head.pre_ssat_reg.block.conv.weight
box_head.pre_ssat_reg.block.norm.weight
box_head.pre_ssat_reg.block.norm.bias
box_head.global_rep_cls.0.pre_norm_attn.0.weight
box_head.global_rep_cls.0.pre_norm_attn.0.bias
box_head.global_rep_cls.0.pre_norm_attn.1.qkv_proj.block.conv.weight
box_head.global_rep_cls.0.pre_norm_attn.1.qkv_proj.block.conv.bias
box_head.global_rep_cls.0.pre_norm_attn.1.out_proj.block.conv.weight
box_head.global_rep_cls.0.pre_norm_attn.1.out_proj.block.conv.bias
box_head.global_rep_cls.0.pre_norm_ffn.0.weight
box_head.global_rep_cls.0.pre_norm_ffn.0.bias
box_head.global_rep_cls.0.pre_norm_ffn.1.block.conv.weight
box_head.global_rep_cls.0.pre_norm_ffn.1.block.conv.bias
box_head.global_rep_cls.0.pre_norm_ffn.3.block.conv.weight
box_head.global_rep_cls.0.pre_norm_ffn.3.block.conv.bias
box_head.global_rep_cls.1.pre_norm_attn.0.weight
box_head.global_rep_cls.1.pre_norm_attn.0.bias
box_head.global_rep_cls.1.pre_norm_attn.1.qkv_proj.block.conv.weight
box_head.global_rep_cls.1.pre_norm_attn.1.qkv_proj.block.conv.bias
box_head.global_rep_cls.1.pre_norm_attn.1.out_proj.block.conv.weight
box_head.global_rep_cls.1.pre_norm_attn.1.out_proj.block.conv.bias
box_head.global_rep_cls.1.pre_norm_ffn.0.weight
box_head.global_rep_cls.1.pre_norm_ffn.0.bias
box_head.global_rep_cls.1.pre_norm_ffn.1.block.conv.weight
box_head.global_rep_cls.1.pre_norm_ffn.1.block.conv.bias
box_head.global_rep_cls.1.pre_norm_ffn.3.block.conv.weight
box_head.global_rep_cls.1.pre_norm_ffn.3.block.conv.bias
box_head.global_rep_cls.2.weight
box_head.global_rep_cls.2.bias
box_head.global_rep_reg.0.pre_norm_attn.0.weight
box_head.global_rep_reg.0.pre_norm_attn.0.bias
box_head.global_rep_reg.0.pre_norm_attn.1.qkv_proj.block.conv.weight
box_head.global_rep_reg.0.pre_norm_attn.1.qkv_proj.block.conv.bias
box_head.global_rep_reg.0.pre_norm_attn.1.out_proj.block.conv.weight
box_head.global_rep_reg.0.pre_norm_attn.1.out_proj.block.conv.bias
box_head.global_rep_reg.0.pre_norm_ffn.0.weight
box_head.global_rep_reg.0.pre_norm_ffn.0.bias
box_head.global_rep_reg.0.pre_norm_ffn.1.block.conv.weight
box_head.global_rep_reg.0.pre_norm_ffn.1.block.conv.bias
box_head.global_rep_reg.0.pre_norm_ffn.3.block.conv.weight
box_head.global_rep_reg.0.pre_norm_ffn.3.block.conv.bias
box_head.global_rep_reg.1.pre_norm_attn.0.weight
box_head.global_rep_reg.1.pre_norm_attn.0.bias
box_head.global_rep_reg.1.pre_norm_attn.1.qkv_proj.block.conv.weight
box_head.global_rep_reg.1.pre_norm_attn.1.qkv_proj.block.conv.bias
box_head.global_rep_reg.1.pre_norm_attn.1.out_proj.block.conv.weight
box_head.global_rep_reg.1.pre_norm_attn.1.out_proj.block.conv.bias
box_head.global_rep_reg.1.pre_norm_ffn.0.weight
box_head.global_rep_reg.1.pre_norm_ffn.0.bias
box_head.global_rep_reg.1.pre_norm_ffn.1.block.conv.weight
box_head.global_rep_reg.1.pre_norm_ffn.1.block.conv.bias
box_head.global_rep_reg.1.pre_norm_ffn.3.block.conv.weight
box_head.global_rep_reg.1.pre_norm_ffn.3.block.conv.bias
box_head.global_rep_reg.2.pre_norm_attn.0.weight
box_head.global_rep_reg.2.pre_norm_attn.0.bias
box_head.global_rep_reg.2.pre_norm_attn.1.qkv_proj.block.conv.weight
box_head.global_rep_reg.2.pre_norm_attn.1.qkv_proj.block.conv.bias
box_head.global_rep_reg.2.pre_norm_attn.1.out_proj.block.conv.weight
box_head.global_rep_reg.2.pre_norm_attn.1.out_proj.block.conv.bias
box_head.global_rep_reg.2.pre_norm_ffn.0.weight
box_head.global_rep_reg.2.pre_norm_ffn.0.bias
box_head.global_rep_reg.2.pre_norm_ffn.1.block.conv.weight
box_head.global_rep_reg.2.pre_norm_ffn.1.block.conv.bias
box_head.global_rep_reg.2.pre_norm_ffn.3.block.conv.weight
box_head.global_rep_reg.2.pre_norm_ffn.3.block.conv.bias
box_head.global_rep_reg.3.pre_norm_attn.0.weight
box_head.global_rep_reg.3.pre_norm_attn.0.bias
box_head.global_rep_reg.3.pre_norm_attn.1.qkv_proj.block.conv.weight
box_head.global_rep_reg.3.pre_norm_attn.1.qkv_proj.block.conv.bias
box_head.global_rep_reg.3.pre_norm_attn.1.out_proj.block.conv.weight
box_head.global_rep_reg.3.pre_norm_attn.1.out_proj.block.conv.bias
box_head.global_rep_reg.3.pre_norm_ffn.0.weight
box_head.global_rep_reg.3.pre_norm_ffn.0.bias
box_head.global_rep_reg.3.pre_norm_ffn.1.block.conv.weight
box_head.global_rep_reg.3.pre_norm_ffn.1.block.conv.bias
box_head.global_rep_reg.3.pre_norm_ffn.3.block.conv.weight
box_head.global_rep_reg.3.pre_norm_ffn.3.block.conv.bias
box_head.global_rep_reg.4.weight
box_head.global_rep_reg.4.bias
box_head.conv1_ctr.0.weight
box_head.conv1_ctr.0.bias
box_head.conv1_ctr.1.weight
box_head.conv1_ctr.1.bias
box_head.conv2_ctr.0.weight
box_head.conv2_ctr.0.bias
box_head.conv2_ctr.1.weight
box_head.conv2_ctr.1.bias
box_head.conv3_ctr.0.weight
box_head.conv3_ctr.0.bias
box_head.conv3_ctr.1.weight
box_head.conv3_ctr.1.bias
box_head.conv4_ctr.weight
box_head.conv4_ctr.bias
box_head.conv1_offset.0.weight
box_head.conv1_offset.0.bias
box_head.conv1_offset.1.weight
box_head.conv1_offset.1.bias
box_head.conv2_offset.0.weight
box_head.conv2_offset.0.bias
box_head.conv2_offset.1.weight
box_head.conv2_offset.1.bias
box_head.conv3_offset.0.weight
box_head.conv3_offset.0.bias
box_head.conv3_offset.1.weight
box_head.conv3_offset.1.bias
box_head.conv4_offset.weight
box_head.conv4_offset.bias
box_head.conv1_size.0.weight
box_head.conv1_size.0.bias
box_head.conv1_size.1.weight
box_head.conv1_size.1.bias
box_head.conv2_size.0.weight
box_head.conv2_size.0.bias
box_head.conv2_size.1.weight
box_head.conv2_size.1.bias
box_head.conv3_size.0.weight
box_head.conv3_size.0.bias
box_head.conv3_size.1.weight
box_head.conv3_size.1.bias
box_head.conv4_size.weight
box_head.conv4_size.bias
checkpoints will be saved to /home/setare/Vision/Work/Tracking/Single Object Tracking/Single Object Trackers/SMAT_FinProj/output/checkpoints
Finished training!

And it happens in seconds. Would you please help me solve my problem? @goutamyg

setarekhosravi commented 2 months ago

anybody???

goutamyg commented 2 months ago

Hi, sorry for the delayed response. I have graduated from my study program few months ago, hence I may be slow in responding.

You can train with a custom dataset by creating a class for the dataset, e.g. this one for LaSOT. Make sure you add the dataset name to the YAML file and update the paths in local.py file under /lib/train/admin/. Some more changes may be needed, you can make them based on the error message upon script execution.

Generally, the code exits from training either once the training is complete or when there is a fully-trained checkpoint already present in the destination folder. This is the file that prints "Training Finished!" message. I cant think of any other reason why the code halts without going through the training process. If this does not help, try running the code in debugging mode to see at what point the code exits from training procedure.

setarekhosravi commented 2 months ago

Thank you, I will try and let you know.