Closed Cadene closed 4 years ago
If the questions were collected from AMT ( 'data_source' field) , we asked the annotators to explicitly ask only complex questions. So those are truly complex questions ie. they mostly have counterfactual examples in the images.
I did not understand the output above .. is dataset: x your prediction and issimple: y from the dataset ?
@manoja328 thanks for your quick reply!
dataset: x corresponds to a label from the dataset
issimple: x correponds to the prediction I get running your issimple function
Also I checked the number of simple vs complex questions I have in the dataset. Numbers are the same as reported in your paper.
If the questions were collected from AMT ( 'data_source' field) , we asked the annotators to explicitly ask only complex questions. So those are truly complex questions ie. they mostly have counterfactual examples in the images.
Just for your information, all misclassified items I sent you are from the imported_genome
data_source.
We have two split for test set: test-simple and test-complex. Our point in the paper was to make a human vetted test complex set that has only complex questions. In test-simple we imported question from visual genome. In many cases we found that even though our simpcomp classifier is predicting them as complex they really dont have conterfactuals in an image which doesnot make them really complex. eg. Imagine the question " How many red dogs are there? " is a complex question but it depends on the image context. If the image has only red dogs then the classifier may get it right just by guessing dogs ( in fact this might be the reason why we have bias in VQA results i.e ignoring linguistic details and just relying on just shallow correlations), so the classifier is correct but for wrong reasons. To make it really complex it must have a dog of red color and other color in the image.
Also its possible our simplecomplex classifier might have missed some real complex questions and put it in our test-simple split. But, that would only add little more complexity to our test-simple split.
I get it, thanks Manoj :)
Hi!
I'm having some issues reproducing your simple-complex classification of the tallyQA testing set.
When I run your simpcomp.py over the questions from the tallyQA testing set, 20% of the items are misclassified compared to the
issimple
label from the testing set.Also I noticed that the
coco
variable is never used.Do you have any insight to help me?
Thanks :)
Misclassified items look like this: