ifzhang / FairMOT

[IJCV-2021] FairMOT: On the Fairness of Detection and Re-Identification in Multi-Object Tracking
MIT License
3.98k stars 936 forks source link

when training it does not use all the datasets and results are not the same #240

Open AndresOsp opened 3 years ago

AndresOsp commented 3 years ago

Hello,

Before anything thanks for sharing your amazing work.

I recreating the train results using the "mix" dataset and the DLA34 model.

I downloaded and followed the instructions in the readme by running: sh experiments/mix_dla34.sh.

However i noticed that the dataloader dont use 2 datasets as the output in the terminal is:

Using tensorboardX
Fix size testing.
training chunk_sizes: [6, 6]
The output will be saved to  /workspace/FairMOT/src/lib/../../exp/mot/mix_dla34
Setting up data...
================================================================================
dataset summary
OrderedDict([('mot17', 1639.0), ('caltech', 1043.0), ('citypersons', 0), ('cuhksysu', 11931.0), ('prw', 933.0), ('eth', 0)])
total # identities: 15547
start index
OrderedDict([('mot17', 0), ('caltech', 1639.0), ('citypersons', 2682.0), ('cuhksysu', 2682.0), ('prw', 14613.0), ('eth', 15546.0)])
================================================================================
heads {'hm': 1, 'wh': 4, 'id': 128, 'reg': 2}

it says that the code do not use citypersons and eth. This is a normal behaviour? I tried to debug and i noticed that the code in the jde.py filter out those images. This is the part that filter the images:

        for ds, label_paths in self.label_files.items():
            max_index = -1
            for lp in label_paths:
                lb = np.loadtxt(lp)
                if len(lb) < 1:
                    continue
                if len(lb.shape) < 2:
                    img_max = lb[1]
                else:
                    img_max = np.max(lb[:, 1])
                if img_max > max_index:
                    max_index = img_max
            self.tid_num[ds] = max_index + 1

From my analysis it filter the images without an ID. However this is unexpected. I downloaded the datasets again but this did not solve my issue.

Then, i decided to train the model ignoring that part. The results are not the same as the presented (in epoch 30). I run: python track.py mot --load_model ../models/fairmot_dla34.pth --conf_thres 0.6

The results for MOT17 are:

  | IDF1 | IDP | IDR | Rcll | Prcn | GT | MT | PT | ML | FP | FN | IDs | FM | MOTA | MOTP
-- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
MOT17-02-SDP | 56.30% | 84.80% | 42.10% | 49.60% | 99.90% | 62 | 11 | 33 | 18 | 13 | 9358 | 68 | 497 | 49.20% | 0.181
MOT17-04-SDP | 87.40% | 91.70% | 83.50% | 90.80% | 99.70% | 83 | 63 | 16 | 4 | 110 | 4383 | 34 | 537 | 90.50% | 0.158
MOT17-05-SDP | 73.90% | 92.70% | 61.50% | 66.10% | 99.60% | 133 | 28 | 64 | 41 | 20 | 2348 | 34 | 173 | 65.30% | 0.161
MOT17-09-SDP | 69.60% | 79.10% | 62.20% | 77.00% | 97.90% | 26 | 15 | 10 | 1 | 88 | 1225 | 25 | 91 | 74.90% | 0.153
MOT17-10-SDP | 55.60% | 88.20% | 40.60% | 45.50% | 99.00% | 57 | 13 | 17 | 27 | 59 | 6994 | 33 | 433 | 44.80% | 0.188
MOT17-11-SDP | 84.80% | 95.30% | 76.40% | 79.60% | 99.40% | 75 | 31 | 25 | 19 | 49 | 1924 | 19 | 152 | 78.90% | 0.142
MOT17-13-SDP | 62.80% | 94.90% | 47.00% | 49.30% | 99.60% | 110 | 20 | 53 | 37 | 21 | 5901 | 39 | 785 | 48.80% | 0.189
OVERALL | 75.70% | 90.60% | 65.00% | 71.40% | 99.60% | 546 | 181 | 218 | 147 | 360 | 32133 | 252 | 2668 | 70.80% | 0.163

Different to the results that i get by testing your model:

  | IDF1 | IDP | IDR | Rcll | Prcn | GT | MT | PT | ML | FP | FN | IDs | FM | MOTA | MOTP
-- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
MOT17-02-SDP | 55.20% | 81.30% | 41.70% | 51.30% | 99.90% | 62 | 12 | 29 | 21 | 7 | 9052 | 62 | 480 | 50.90% | 0.18
MOT17-04-SDP | 91.20% | 95.10% | 87.70% | 92.10% | 99.90% | 83 | 60 | 18 | 5 | 59 | 3779 | 30 | 429 | 91.90% | 0.161
MOT17-05-SDP | 76.60% | 93.80% | 64.80% | 69.00% | 99.80% | 133 | 33 | 71 | 29 | 8 | 2144 | 34 | 159 | 68.40% | 0.172
MOT17-09-SDP | 75.90% | 84.60% | 68.80% | 79.80% | 98.10% | 26 | 16 | 10 | 0 | 83 | 1076 | 15 | 85 | 78.00% | 0.16
MOT17-10-SDP | 57.90% | 91.00% | 42.50% | 46.30% | 99.20% | 57 | 14 | 18 | 25 | 45 | 6893 | 31 | 399 | 45.70% | 0.192
MOT17-11-SDP | 83.90% | 93.90% | 75.80% | 80.00% | 99.20% | 75 | 29 | 27 | 19 | 59 | 1886 | 19 | 172 | 79.20% | 0.149
MOT17-13-SDP | 66.60% | 93.00% | 51.80% | 55.40% | 99.40% | 110 | 29 | 45 | 36 | 41 | 5195 | 47 | 719 | 54.60% | 0.201
OVERALL | 78.20% | 92.30% | 67.90% | 73.30% | 99.60% | 546 | 193 | 218 | 135 | 302 | 30025 | 238 | 2443 | 72.80% | 0.168

In conclusion:

Thanks for your help.

Cordially

ifzhang commented 3 years ago
  1. We only use ETH and Citypersons to train the detection branch because these two datasets do not have id annotations. It indeed loads images from ETH and Citypersons.
  2. Our model fairmot_dla34.pth is pretrained on CrowdHuman dataset and that brings some gain compared to only training the model on "mix" dataset.
  3. You can set --conf_thres 0.4 to get better results on MOT17 dataset.
AndresOsp commented 3 years ago

Thank you for your answer.

Then (if i am not wrong). By running: sh experiments/mix_dla34.sh

The images from ETH and Citypersons should be loaded. Therefore i have a problem running the code?

ifzhang commented 3 years ago

Just run sh experiments/mix_dla34.sh and you can load the images from ETH and CityPersons.

AndresOsp commented 3 years ago

I did that as shown:

Using tensorboardX
Fix size testing.
training chunk_sizes: [6, 6]
The output will be saved to  /workspace/FairMOT/src/lib/../../exp/mot/mix_dla34
Setting up data...
================================================================================
dataset summary
OrderedDict([('mot17', 1639.0), ('caltech', 1043.0), ('citypersons', 0), ('cuhksysu', 11931.0), ('prw', 933.0), ('eth', 0)])
total # identities: 15547
start index
OrderedDict([('mot17', 0), ('caltech', 1639.0), ('citypersons', 2682.0), ('cuhksysu', 2682.0), ('prw', 14613.0), ('eth', 15546.0)])
================================================================================
heads {'hm': 1, 'wh': 4, 'id': 128, 'reg': 2}

The algorithm so not use those images to train the detections.

ifzhang commented 3 years ago

Ah, I see. The output is the same as mine. The number means the number of IDs instead of images. So citypersons and eth show 0. It indeed uses the images of ETH and Citypersons to train the detection branch. Do not worry about that.

AndresOsp commented 3 years ago

Thank you for your quick answer.

Then, i trained the model using: sh experiments/mix_dla34.sh The trained model do not match the results of fairmot_dla34.pth. By running: python track.py mot --load_model ../models/fairmot_dla34.pth --conf_thres 0.6 vs python track.py mot --load_model ../exp/mot/mix_dla34/model_last.pth --conf_thres 0.6 Do you have any insides about this difference?

Regards