Reproducing simple-complex classification

Cadene commented 4 years ago

Hi!

I'm having some issues reproducing your simple-complex classification of the tallyQA testing set.

When I run your simpcomp.py over the questions from the tallyQA testing set, 20% of the items are misclassified compared to the issimple label from the testing set.

Also I noticed that the coco variable is never used.

Do you have any insight to help me?

Thanks :)

Misclassified items look like this:

How many outlets are in the wall? dataset:True issimple:False
How many pictures are on the wall? dataset:True issimple:False
How many shelves on the wall? dataset:True issimple:False     
How many computers are on the desk? dataset:True issimple:False 
How many apples are on the table? dataset:True issimple:False  
How many glasses are in the table? dataset:True issimple:False                                                                        
How many lids on the table? dataset:True issimple:False         
How many people are in the room? dataset:True issimple:False                                                                          
How many computers are on the desk? dataset:True issimple:False
How many shelves on the wall? dataset:True issimple:False      
How many computer towers are shown? dataset:True issimple:False                                                                       
How many shelves are there in the bookshelf? dataset:True issimple:False
How many computer monitors are pictured? dataset:True issimple:False                                  
How many computer monitors are in the picture? dataset:True issimple:False
How many CPUs can be seen in the picture? dataset:True issimple:False                      
How many computer screens are pictured? dataset:True issimple:False 
How many keyboards are on the desk? dataset:True issimple:False                                                                       
How many computer monitors are there? dataset:True issimple:False
How many water bottles are there? dataset:True issimple:False                                                                         
How many speakers are visible? dataset:True issimple:False
How many things on the wall? dataset:True issimple:False                                                                              
How many computers are on the desk? dataset:True issimple:False
How many computers are in the picture? dataset:True issimple:False                                                                    
How many windows are visible? dataset:True issimple:False                                                                             
How many people on the right? dataset:True issimple:False                                                                             
How many baby carriages are pictured? dataset:True issimple:False                                                                     
How many chairs are in the room? dataset:True issimple:False
How many sockets are visible? dataset:True issimple:False                                                                             
How many air conditioners can be seen? dataset:True issimple:False
How many wall sockets? dataset:True issimple:False                                                                                    
How many books are on the table? dataset:True issimple:False                                                                          
How many pillows are on the couch? dataset:True issimple:False                                                                        How many pictures are on the wall? dataset:True issimple:False
How many bowls are on coffee table? dataset:True issimple:False
How many pictures are on the walls? dataset:True issimple:False                                                                       
How many women are in the picture? dataset:True issimple:False                                                                        
How many pillows are on the couch? dataset:True issimple:False                                                                        How many paintings are on the wall? dataset:True issimple:False
How many wine glasses are on the table? dataset:True issimple:False 
How many picture frames are on the table? dataset:True issimple:False                                           
How many wheels are on the chair in the front? dataset:True issimple:False
How many pillows are on the sofa? dataset:True issimple:False                                                                         
How many pillows are on the couch? dataset:True issimple:False                                                                        
How many mirrors are on the wall? dataset:True issimple:False                                                                         
How many mirrors are in the photo? dataset:True issimple:False
How many picture frames are on the table? dataset:True issimple:False                                                    
How many shelves are on the bookcase? dataset:True issimple:False 
How many toaster ovens? dataset:True issimple:False                                                                                   
How many plates on the counter? dataset:True issimple:False
How many tables are in the picture? dataset:True issimple:False
How many taxis are in the street? dataset:True issimple:False
How many signs are in the scene? dataset:True issimple:False                                                                          
How many koala bears are there? dataset:True issimple:False
How many crosswalk signs are there? dataset:True issimple:False
How many windows are on the brick building? dataset:True issimple:False
How many seconds are left on the crosswalk sign? dataset:True issimple:False
How many children are in the picture? dataset:True issimple:False
How many wheels on skateboard? dataset:True issimple:False
How many animals are in the picture? dataset:True issimple:False
How many cats are on the laptop? dataset:True issimple:False
How many claws show in the picture? dataset:True issimple:False
How many photos are on the end table? dataset:True issimple:False
How many computers are in the picture? dataset:True issimple:False
How many pillows are on the bed? dataset:True issimple:False
How many drawers do you see? dataset:True issimple:False
How many men are in the train? dataset:True issimple:False
How many tourists can be seen? dataset:True issimple:False
How many model planes are on the table? dataset:True issimple:False 
How many license plates can be seen? dataset:True issimple:False
How many people have on flip flops? dataset:True issimple:False
How many couples share umbrellas? dataset:True issimple:False
How many kites are in the sky? dataset:True issimple:False
How many of the man's fingers are extended? dataset:True issimple:False
How many windows are in the building? dataset:True issimple:False
How many tennis players are pictured? dataset:True issimple:False
How many streaks are on the door in the front? dataset:True issimple:False
How many people are in the boat? dataset:True issimple:False
How many people are on the clock? dataset:True issimple:False

manoja328 commented 4 years ago

If the questions were collected from AMT ( 'data_source' field) , we asked the annotators to explicitly ask only complex questions. So those are truly complex questions ie. they mostly have counterfactual examples in the images.

manoja328 commented 4 years ago

I did not understand the output above .. is dataset: x your prediction and issimple: y from the dataset ?

Cadene commented 4 years ago

@manoja328 thanks for your quick reply!

dataset: x corresponds to a label from the dataset

issimple: x correponds to the prediction I get running your issimple function

Also I checked the number of simple vs complex questions I have in the dataset. Numbers are the same as reported in your paper.

Cadene commented 4 years ago

If the questions were collected from AMT ( 'data_source' field) , we asked the annotators to explicitly ask only complex questions. So those are truly complex questions ie. they mostly have counterfactual examples in the images.

Just for your information, all misclassified items I sent you are from the imported_genome data_source.

manoja328 commented 4 years ago

We have two split for test set: test-simple and test-complex. Our point in the paper was to make a human vetted test complex set that has only complex questions. In test-simple we imported question from visual genome. In many cases we found that even though our simpcomp classifier is predicting them as complex they really dont have conterfactuals in an image which doesnot make them really complex. eg. Imagine the question " How many red dogs are there? " is a complex question but it depends on the image context. If the image has only red dogs then the classifier may get it right just by guessing dogs ( in fact this might be the reason why we have bias in VQA results i.e ignoring linguistic details and just relying on just shallow correlations), so the classifier is correct but for wrong reasons. To make it really complex it must have a dog of red color and other color in the image.
Also its possible our simplecomplex classifier might have missed some real complex questions and put it in our test-simple split. But, that would only add little more complexity to our test-simple split.

Cadene commented 4 years ago

I get it, thanks Manoj :)

manoja328 / tallyqacode

Reproducing simple-complex classification #2