Implementation of "TrackFormer: Multi-Object Tracking with Transformers". [Conference on Computer Vision and Pattern Recognition (CVPR), 2022]
Pre-Trained models have no attribute for generating attention maps #5

I try to get attention maps from pre-trained models updated 1 hours ago. Using a virtual environment with all the requirements from INSTALL.md in the same version, I tryed executing the following :

python src/track.py with \ 
dataset_name=DEMO \
data_root_dir=data/snakeboard \
output_dir=snake_results \
write_images=pretty \
generate_attention_maps=True \
verbose=True \

I got :

WARNING - root - Changed type of config entry "write_images" from bool to str
WARNING - track - No observers have been added to this run
INFO - track - Running command 'main'
INFO - track - Started
Configuration (modified, added, typechanged, doc):
  data_root_dir = 'data/snakeboard'
  dataset_name = 'DEMO'
  generate_attention_maps = True
  interpolate = False
  load_results_dir = None
  obj_detect_checkpoint_file = 'models/mot17_train_deformable_public/checkpoint.pth'
  output_dir = 'snake_results'
  seed = 666
  verbose = True
  write_images = 'pretty'
    end = 1.0
    start = 0.0
    detection_nms_thresh = 0.9
    detection_obj_score_thresh = 0.9
    inactive_patience = -1
    public_detections = False
    reid_greedy_matching = False
    reid_score_thresh = 0.8
    reid_sim_only = False
    reid_sim_threshold = 0.0
    track_nms_thresh = 0.9
    track_obj_score_thresh = 0.8
INFO - main - INIT object detector [EPOCH: 40]
ERROR - track - Failed after 0:00:12!
Traceback (most recent calls WITHOUT Sacred internals):
  File "src/track.py", line 103, in main
    generate_attention_maps, track_logger)
  File "/trackformer/src/trackformer/models/tracker.py", line 51, in __init__
    multihead_attn = self.obj_detector.transformer.decoder.layers[-1].multihead_attn
  File "/miniconda3/envs/venv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 594, in __getattr__
    type(self).__name__, name))
AttributeError: 'DeformableTransformerDecoderLayer' object has no attribute 'multihead_attn'

With the same but with obj_detect_checkpoint_file=models/mots20_train_masks/checkpoint.pth instead I got :

Traceback (most recent calls WITHOUT Sacred internals):
  File "src/track.py", line 69, in main
    obj_detector, _, obj_detector_post = build_model(obj_detect_args)
  File "/data/mlarsen/trackformer/src/trackformer/models/__init__.py", line 28, in build_model
    matcher = build_matcher(args)
  File "/data/mlarsen/trackformer/src/trackformer/models/matcher.py", line 132, in build_matcher
AttributeError: 'Namespace' object has no attribute 'focal_loss'

By adding focal_loss=false and focal_alpha=0.25 to models/mots20_train_masks/config.yaml, I now get :

Traceback (most recent calls WITHOUT Sacred internals):
  File "src/track.py", line 132, in main
  File "/trackformer/src/trackformer/models/tracker.py", line 288, in step
    outputs, *_ = self.obj_detector(img, target)
  File "/miniconda3/envs/venv-fb/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/trackformer/src/trackformer/models/detr_segmentation.py", line 42, in forward
    out, targets, features, memory, hs = super().forward(samples, targets)
  File "/trackformer/src/trackformer/models/detr_tracking.py", line 129, in forward
    out, targets, features, memory, hs  = super().forward(samples, targets)
  File "/trackformer/src/trackformer/models/detr.py", line 113, in forward
    src, mask, query_embed, pos, tgt)
  File "/miniconda3/envs/venv-fb/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/trackformer/src/trackformer/models/transformer.py", line 76, in forward
  File "/miniconda3/envs/venv-fb/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/trackformer/src/trackformer/models/transformer.py", line 154, in forward
    pos=pos, query_pos=query_pos)
  File "/miniconda3/envs/venv-fb/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/trackformer/src/trackformer/models/transformer.py", line 308, in forward
    tgt_key_padding_mask, memory_key_padding_mask, pos, query_pos)
  File "/trackformer/src/trackformer/models/transformer.py", line 266, in forward_post
  File "/miniconda3/envs/venv-fb/lib/python3.7/site-packages/torch/nn/modules/module.py", line 552, in __call__
    hook_result = hook(self, input, result)
  File "/trackformer/src/trackformer/models/tracker.py", line 47, in add_attention_map_to_data
    attention_maps = output[1].view(-1, height, width)
RuntimeError: shape '[-1, 188, 334]' is invalid for input of size 100800

Thanks you for this excellent paper, for releasing the code and for your time.

timmeinhardt commented 3 years ago

Indeed models trained with deformable Transformers currently do not support the generation of attention maps. I added an assert statement and opened an issue https://github.com/fundamentalvision/Deformable-DETR/issues/80 as this is not even supported by the official deformable DETR repository.

Furthermore, I uploaded new model files and fixed the code. Now you should be able to generate attention maps with the provided segmentation model. This is only possible since this model works based on vanilla DETR.

WanderingMars commented 1 year ago

Where can I download the pretrained model?thanks~

timmeinhardt commented 1 year ago

Step 4 here.