Mall dataset testing time

javiribera / locating-objects-without-bboxes

PyTorch code for "Locating objects without bounding boxes" - Loss function and trained models

Other

249 stars 52 forks source link

Mall dataset testing time #23

Closed austinbeauch closed 4 years ago

austinbeauch commented 4 years ago

As specified in the paper, you did a train/val/test split on the mall dataset (2000 frames) of 80/10/10 which would give 200 testing frames. How long should it take to locate all the objects for those 200 images? After using a script to turn the .mat file into train/val/test datasets with the proper ground truth files and using the command python -m object-locator.locate --dataset mall_dataset/mall_test/ --out mall_out/ --model mall.ckpt --evaluate it takes ~1.5 hours to run on the entire testing set which is much longer than the training/validation iterations. I'm just running it locally on an i5-6600k and a 1070. Training was taking about 1.5 minutes per epoch and validation was around 5 minutes which makes me wonder what I could be doing wrong for testing. Perhaps this is expected, but I just wanted to check.

Thanks in advance!

javiribera commented 4 years ago

From my experience, the testing speed varies widely between images, depending of how "spread" the probability map is. Most of the time was spent in the Expectation Maximization algorithm, in the "cluster" function (see https://github.com/javiribera/locating-objects-without-bboxes/blob/5bf3a09b1ccf7248c789089acc656565d21ba437/object-locator/utils.py#L199). Please check if this also happens to you. If the network is very uncertain and after thresholding there are many "candidate" points, the ":param array: Binary array." will contain too many points.

A workaround to alleviate this problem is to use the max_mask_pts parameter of the "cluster" function (see the docstring). You can set it through the command-line via "--max-mask-pts". This is disabled by default for higher accuracy.

But honestly if you want to get it working faster, it is better (and more accurate) to make the network converge more so that the probability map is "narrower".

austinbeauch commented 4 years ago

Thanks for the reply @javiribera! It actually ended up being the tau's that was causing the slowdown, as the default tests on 27 different thresholding values. Changing it down to one brought the testing down to around a minute.

javiribera commented 4 years ago

You are right. Thanks for the heads up. I am changing the default of --tau so that it only does one value (actually the BMM-adaptive tau), instead of 27 values (f8a0b43868b092c0e2b7209eb6f2f4398a59ed7e). This has a x5 speed boost on inference using my CPU. This was brought up in other issues such as https://github.com/javiribera/locating-objects-without-bboxes/issues/9 and https://github.com/javiribera/locating-objects-without-bboxes/issues/13