Why only use set 1-4 of tracking net for training?

shen-ttt commented 1 year ago

I am trying to reproduce the training for ToMP and DiMP and find out all the training scripts are using the first 4 sets of tracking net, as shown in this code snippet:

# Train datasets
lasot_train = Lasot(settings.env.lasot_dir, split='train')
got10k_train = Got10k(settings.env.got10k_dir, split='vottrain')
trackingnet_train = TrackingNet(settings.env.trackingnet_dir, set_ids=list(range(4)))
coco_train = MSCOCOSeq(settings.env.coco_dir)

Why don't you use all the training sets (0-11)? I would assume more data usually lead to better performance.

Thank you in advance for any response!

2006pmach commented 1 year ago

We use only 5 training sets of TrackingNet because it is so huge. This is somewhat historically motivated. You can try if it works better if you use the entire TrackingNet training set. Would be nice if you can share some insights once you found out.

shen-ttt commented 1 year ago

Got it. Other tracking codebase, for example, mmtracking, also uses a subset of tracking net. Will definitely try to use all trackingnet and report my results here.

Another question, since trackingnet is unproporationally larger than other tracking dataset, like GOT10k, do you think we need to adjust the sampling weight of each dataset (p_datasets) ? Now ToMP is using equal weight for the four training datasets.

dataset_train = sampler.DiMPSampler([lasot_train, got10k_train, trackingnet_train, coco_train], [1, 1, 1, 1],
                                    samples_per_epoch=settings.train_samples_per_epoch, max_gap=settings.max_gap,
                                    num_test_frames=settings.num_test_frames, num_train_frames=settings.num_train_frames,
                                    processing=data_processing_train)

visionml / pytracking

Why only use set 1-4 of tracking net for training? #373