microsoft / Oscar

Oscar and VinVL
MIT License
1.04k stars 251 forks source link

inference has something error #79

Open whqwill opened 3 years ago

whqwill commented 3 years ago

I follow the script from https://github.com/microsoft/Oscar/blob/master/VinVL_MODEL_ZOO.md to do inference on COCO test set for image captioning. And I use the model "coco_captioning_large_xe.zip".

but the result shows it always describe different image.

e.g. the image id is 391895, image

and the "od_label" is : "man helmet scooter sky ground ground mountain wheel man shirt mountain road mountain shoe trees rocks bike wheel post bridge grass fence jeans mirror bush grass mountains road bridge people trees tree man post bag motorcycle bridge” but the result caption is: "a group of men standing next to each other"

the image id is 60623, image

and the "od_label" is : "woman bowl woman people hair man fork wrist woman face hand watch finger spoon eye shirt woman fork person table hand flame person hair ring spoon candle glass" but the result caption is: "a group of men sitting on top of a couch"

Did I miss something or do something wrong?

xiaoweihu commented 3 years ago

Hi,

The prediction result file has two columns. The first column is the image id, the second column has the generated captions.

ljd16 commented 3 years ago

@whqwill I got the same results, did you find the reason?