tjiiv-cprg / EPro-PnP-v2

[TPAMI 2024] EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation
https://arxiv.org/abs/2303.12787
MIT License
129 stars 8 forks source link

inference is non-deterministic ? #11

Closed harsanyidani closed 2 months ago

harsanyidani commented 2 months ago

Hi!

I've been doing inference with the following script, using the previous repo and this config:

from mmcv.parallel import MMDataParallel
from mmdet.datasets import build_dataloader
from epropnp_det.datasets.builder import build_dataset
from epropnp_det.apis.inference import init_detector
from mmcv import Config
import torch
from mmdet.apis import set_random_seed

set_random_seed(0, deterministic=True)

config_file = 'configs/epropnp_det_basic.py'
checkpoint_file = '/path/to/checkpoint/file'
device = 'cuda:0'
cfg = Config.fromfile(config_file)
distributed = False
samples_per_gpu = cfg.data.val.pop('samples_per_gpu', 1)
samples_per_gpu = 1
dataset = build_dataset(cfg.data.val)
model = init_detector(cfg, checkpoint_file, device=device)
model.test_cfg['debug'] = ['orient']
model = MMDataParallel(model, device_ids=[0])

data_loader = build_dataloader(
    dataset,
    samples_per_gpu=samples_per_gpu,
    workers_per_gpu=cfg.data.workers_per_gpu,
    dist=distributed,
    shuffle=False)

for i, data in enumerate(data_loader):
    with torch.no_grad():
        result = model(return_loss=False, rescale=True, **data)
    print(result[0]["orient_logprob"][0].shape)
    print(result[0]["bbox_results"][0].shape)
    print(result[0]["bbox_3d_results"][0].shape)
    print("------------------------------------")
    if i == 20:
        break

print('2nd for cycle')

for i, data in enumerate(data_loader):

    with torch.no_grad():
        result = model(return_loss=False, rescale=True, **data)
    print(result[0]["orient_logprob"][0].shape)
    print(result[0]["bbox_results"][0].shape)
    print(result[0]["bbox_3d_results"][0].shape)
    print("------------------------------------")

    logprob = result[0]["orient_logprob"]
    bbox_3d = result[0]["bbox_3d_results"]
    if i == 20:
        break

This way I'm printing the shapes of results for cars in each image. The first part of the shapes correspond to the number of detected objects for the image. I noticed that despite setting the seed I sometimes (from 2*20 iterations always) get different number of detections for the two iterations of the same dataloader (separated by print('2nd for cycle') ).

Outputs for the above script: FIRST ITERATION:

(1, 128)
(1, 5)
(1, 20)
------------------------------------
(0, 128)
(0, 5)
(0, 20)
------------------------------------
(5, 128)
(5, 5)
(5, 20)
------------------------------------
(33, 128)
(33, 5)
(33, 20)
------------------------------------
(14, 128)
(14, 5)
(14, 20)
------------------------------------
(2, 128)
(2, 5)
(2, 20)
------------------------------------
(5, 128)
(5, 5)
(5, 20)
------------------------------------
(0, 128)
(0, 5)
(0, 20)
------------------------------------
(4, 128)
(4, 5)
(4, 20)
------------------------------------
(35, 128)
(35, 5)
(35, 20)
------------------------------------
(12, 128)
(12, 5)
(12, 20)
------------------------------------
(1, 128)
(1, 5)
(1, 20)
------------------------------------
(2, 128)
(2, 5)
(2, 20)
------------------------------------
(0, 128)
(0, 5)
(0, 20)
------------------------------------
(5, 128)
(5, 5)
(5, 20)
------------------------------------
(33, 128)
(33, 5)
(33, 20)
------------------------------------
(15, 128)
(15, 5)
(15, 20)
------------------------------------
(0, 128)
(0, 5)
(0, 20)
------------------------------------
(3, 128)
(3, 5)
(3, 20)
------------------------------------
(0, 128)
(0, 5)
(0, 20)
------------------------------------
(3, 128)
(3, 5)
(3, 20)

SECOND ITERATION:

(1, 128)
(1, 5)
(1, 20)
------------------------------------
(0, 128)
(0, 5)
(0, 20)
------------------------------------
(6, 128)
(6, 5)
(6, 20)
------------------------------------
(32, 128)
(32, 5)
(32, 20)
------------------------------------
(15, 128)
(15, 5)
(15, 20)
------------------------------------
(2, 128)
(2, 5)
(2, 20)
------------------------------------
(5, 128)
(5, 5)
(5, 20)
------------------------------------
(0, 128)
(0, 5)
(0, 20)
------------------------------------
(5, 128)
(5, 5)
(5, 20)
------------------------------------
(34, 128)
(34, 5)
(34, 20)
------------------------------------
(12, 128)
(12, 5)
(12, 20)
------------------------------------
(1, 128)
(1, 5)
(1, 20)
------------------------------------
(2, 128)
(2, 5)
(2, 20)
------------------------------------
(0, 128)
(0, 5)
(0, 20)
------------------------------------
(4, 128)
(4, 5)
(4, 20)
------------------------------------
(32, 128)
(32, 5)
(32, 20)
------------------------------------
(12, 128)
(12, 5)
(12, 20)
------------------------------------
(0, 128)
(0, 5)
(0, 20)
------------------------------------
(3, 128)
(3, 5)
(3, 20)
------------------------------------
(0, 128)
(0, 5)
(0, 20)
------------------------------------
(3, 128)
(3, 5)
(3, 20)

As you can see from 20 iterations there are 8 differences in detected object numbers. Only one difference is bigger than 1: 12 instead of 15.

What could be the cause of this? Maybe the non-deterministic nature of the pnp-solver? Thanks in advance for the help!

Lakonik commented 2 months ago

Yes. To find the optimal pose of a potentially multi-modal pose distribution, the pnp solver is initialized with random hypotheses (similar to ransac), so the results are non-deterministic.

harsanyidani commented 2 months ago

Thank you!