Hi!
I found that the number of words detected by OCR in some pictures in stvqa dataset is inconsistent with the corresponding feature number.
For example, the number of features in 'featresx/stvqa/train/imageNet/n03196217 7957. npy' is 33, while the number of OCR words in the corresponding 'ocr feat resx/stvqa conf/train/imageNet/n03196217 7957_info. npy' is 55. The two numbers do not match. About 2000 pictures have this problem in train dataset.
Hi! I found that the number of words detected by OCR in some pictures in stvqa dataset is inconsistent with the corresponding feature number. For example, the number of features in 'featresx/stvqa/train/imageNet/n03196217 7957. npy' is 33, while the number of OCR words in the corresponding 'ocr feat resx/stvqa conf/train/imageNet/n03196217 7957_info. npy' is 55. The two numbers do not match. About 2000 pictures have this problem in train dataset.