yeliudev / ConsNet

🚴‍♂️ ConsNet: Learning Consistency Graph for Zero-Shot Human-Object Interaction Detection (MM 2020)
https://arxiv.org/abs/2008.06254
GNU General Public License v3.0
33 stars 2 forks source link

Given an image name from HICO_DET, how can I retreive its HOI_idx using ConsNet API? #3

Closed monacv closed 3 years ago

yeliudev commented 3 years ago

Please try this.

from consnet.api import load_anno

img_name = 'HICO_train2015_00000001.jpg'
img_id = int(img_name[-12:-4])

anno = load_anno('<path-to-anno_bbox.mat>', 'train')
hoi_idx = anno[anno[:, 0] == img_id, 1].int().tolist()
monacv commented 3 years ago

thanks a lot, so I ran the following and it took quite a while which means it would be very long if I want to run it for all images also not sure why it also has "eat cake" as an annotation? There is no eating in this photo. Is there a way I could acceleta this for all images? data/hico_20160224_det/images/train2015/HICO_train2015_00001549.jpg Screenshot from 2021-04-08 18-28-08

monacv commented 3 years ago

I also tried another image which has very weird results Screenshot from 2021-04-08 18-43-18

feh data/hico_20160224_det/images/test2015/HICO_test2015_00007935.jpg

(consnet) mona@goku:~/research/code/ConsNet$ python api_demo.py 
[133, 133, 139, 139]
--- 43.073837995529175 seconds ---
133  horse           hug           
134  horse           jump          
135  horse           kiss          
136  horse           load          
137  horse           hop_on        
138  horse           pet           
139  horse           race      

from hico_list_hoi.txt as you see there is no horse involved.

Here's the api_demo.py code:

import time
start_time = time.time()    
from consnet.api import load_anno

img_name = 'data/hico_20160224_det/images/test2015/HICO_test2015_00007935.jpg'
img_id = int(img_name[-12:-4])

anno = load_anno('data/hico_20160224_det/anno_bbox.mat', 'train')
hoi_idx = anno[anno[:, 0] == img_id, 1].int().tolist()
print(hoi_idx)
print("--- %s seconds ---" % (time.time() - start_time))

I think 40s is a very long time for a lookup.

yeliudev commented 3 years ago

thanks a lot, so I ran the following and it took quite a while which means it would be very long if I want to run it for all images also not sure why it also has "eat cake" as an annotation? There is no eating in this photo. Is there a way I could acceleta this for all images? data/hico_20160224_det/images/train2015/HICO_train2015_00001549.jpg Screenshot from 2021-04-08 18-28-08

I think this may be the problem of HICO-DET's original annotation. Please double check it in anno_bbox.mat.

yeliudev commented 3 years ago

I also tried another image which has very weird results Screenshot from 2021-04-08 18-43-18

feh data/hico_20160224_det/images/test2015/HICO_test2015_00007935.jpg

(consnet) mona@goku:~/research/code/ConsNet$ python api_demo.py 
[133, 133, 139, 139]
--- 43.073837995529175 seconds ---
133  horse           hug           
134  horse           jump          
135  horse           kiss          
136  horse           load          
137  horse           hop_on        
138  horse           pet           
139  horse           race      

from hico_list_hoi.txt as you see there is no horse involved.

Here's the api_demo.py code:

import time
start_time = time.time()    
from consnet.api import load_anno

img_name = 'data/hico_20160224_det/images/test2015/HICO_test2015_00007935.jpg'
img_id = int(img_name[-12:-4])

anno = load_anno('data/hico_20160224_det/anno_bbox.mat', 'train')
hoi_idx = anno[anno[:, 0] == img_id, 1].int().tolist()
print(hoi_idx)
print("--- %s seconds ---" % (time.time() - start_time))

I think 40s is a very long time for a lookup.

You are fetching the annotations of test split, thus the second argument of load_anno method should be 'test'.

It does not take such a long time for the loop in our own cases, is the most time consuming part comes from load_anno? This method is only needed to be called once for all images.

monacv commented 3 years ago

so when I run the matlab code that comes with the original HICO_DET dataset sometimes the results are different (thanks for catching my mistake about load_anno).

Here, I am loading train load_anno, and

$ python api_demo.py [56, 60, 60, 60, 60, 60, 60, 60, 60, 60, 61, 61, 61, 61, 61, 61, 61, 61, 61, 61, 61, 61, 61] --- 7.9999425411224365 seconds ---

While I get drive bus from your results, I get other results too, do you know why and why your results is different from original matlab code -- I personally accept that if you are "driving a bus", you are also "riding a bus", and also "sitting on bus". 055 bus board
056 bus direct
057 bus drive
058 bus exit
059 bus inspect
060 bus load
061 bus ride
062 bus sit_on
063 bus wash
064 bus wave

while the matlab is yielding this result: Screenshot from 2021-04-08 22-58-14

Here's the original matlab code:

im_root   = '../images/';
bbox_file = '../anno_bbox.mat';

ld = load(bbox_file);
bbox_train = ld.bbox_train;
bbox_test = ld.bbox_test;
list_action = ld.list_action;

% change this
i = 765;  % image index
j = 1;    % hoi index

% read image
im_file = [im_root 'train2015/' bbox_train(i).filename];
im = imread(im_file);

% display image
figure(1);
imshow(im); hold on;

% display hoi
hoi_id = bbox_train(i).hoi(j).id;
aname = [list_action(hoi_id).vname_ing ' ' list_action(hoi_id).nname];
aname = strrep(aname,'_',' ');
title(aname);

% display bbox
if bbox_train(i).hoi(j).invis
    fprintf('hoi not visible\n');
else
    bboxhuman  = bbox_train(i).hoi(j).bboxhuman;
    bboxobject = bbox_train(i).hoi(j).bboxobject;
    connection = bbox_train(i).hoi(j).connection;
    visualize_box_conn_one(bboxhuman, bboxobject, connection, 'b','g');
end

and here's the image name I used: data/hico_20160224_det/images/train2015/HICO_train2015_00000765.jpg

monacv commented 3 years ago

regarding the cake, I made a mistake. Since I didn't check the one off in yours. so basically yours return "carry cake", and "hold cake" which basically makes sense but the original matlab code returns only "carry cake". If you could shed any light on why the original code doesn't return "hold cake" would be really great. Thanks for the nice API.

Screenshot from 2021-04-08 23-03-06

monacv commented 3 years ago

*I just wanted to update that you were correct regarding time and running it for all the train images only took 114 seconds :)

(consnet) mona@goku:~/research/code/ConsNet$ python api_demo.py --- 114.5227403640747 seconds ---

yeliudev commented 3 years ago

so when I run the matlab code that comes with the original HICO_DET dataset sometimes the results are different (thanks for catching my mistake about load_anno).

Here, I am loading train load_anno, and

$ python api_demo.py [56, 60, 60, 60, 60, 60, 60, 60, 60, 60, 61, 61, 61, 61, 61, 61, 61, 61, 61, 61, 61, 61, 61] --- 7.9999425411224365 seconds ---

While I get drive bus from your results, I get other results too, do you know why and why your results is different from original matlab code -- I personally accept that if you are "driving a bus", you are also "riding a bus", and also "sitting on bus". 055 bus board 056 bus direct 057 bus drive 058 bus exit 059 bus inspect 060 bus load 061 bus ride 062 bus sit_on 063 bus wash 064 bus wave

while the matlab is yielding this result: Screenshot from 2021-04-08 22-58-14

Here's the original matlab code:

im_root   = '../images/';
bbox_file = '../anno_bbox.mat';

ld = load(bbox_file);
bbox_train = ld.bbox_train;
bbox_test = ld.bbox_test;
list_action = ld.list_action;

% change this
i = 765;  % image index
j = 1;    % hoi index

% read image
im_file = [im_root 'train2015/' bbox_train(i).filename];
im = imread(im_file);

% display image
figure(1);
imshow(im); hold on;

% display hoi
hoi_id = bbox_train(i).hoi(j).id;
aname = [list_action(hoi_id).vname_ing ' ' list_action(hoi_id).nname];
aname = strrep(aname,'_',' ');
title(aname);

% display bbox
if bbox_train(i).hoi(j).invis
    fprintf('hoi not visible\n');
else
    bboxhuman  = bbox_train(i).hoi(j).bboxhuman;
    bboxobject = bbox_train(i).hoi(j).bboxobject;
    connection = bbox_train(i).hoi(j).connection;
    visualize_box_conn_one(bboxhuman, bboxobject, connection, 'b','g');
end

and here's the image name I used: data/hico_20160224_det/images/train2015/HICO_train2015_00000765.jpg

Both our API and the MATLAB code are correct :) In HICO-DET, a single image may contain multiple HOIs (i.e. multiple human-object pairs performing different actions). There are 23 HOI instances in total in HICO_train2015_00000765.jpg, which can be seen in the anno_bbox.mat. But the MATLAB code you provided just prints the first one.

Please also note that the API outputs hoi_idx (0 ~ 599) instead of hoi_id (1 ~ 600), so the three HOIs should be 56 - direct bus, 60 - ride bus and 61 - sit_on bus.

yeliudev commented 3 years ago

*I just wanted to update that you were correct regarding time and running it for all the train images only took 114 seconds :)

(consnet) mona@goku:~/research/code/ConsNet$ python api_demo.py --- 114.5227403640747 seconds ---

Thanks for reporting this :)