Closed zengyan-97 closed 2 years ago
Hello Zeng, Sorry to reply late. I don't remember if there is further processing we do to filter data. I think I made a mistake in reporting the scale of the validation set. It is indeed 37K instead of 30K images. I will update the paper to fix this mistake, and will also share the Japanese VQA data too. Regarding the top-3000 frequent answer, we indeed do some postprocessing which will to some extent address the mistake you found in the answer. To ease your work, I will directly share with you the final 3000 answers we use for our experiment. https://drive.google.com/drive/folders/1BTL6nGe2YIOHEK5PqGO8UCUjTwiQ13d8?usp=sharing
Thank you very much!
Hi,
I tried to reproduce UC2 on VG VQA JA, but I got accuracy of ~25% instead of the reported ~34%.
I followed UC2 paper to preprocess the data and I submitted an issue about data split before (thank you again for replying), but I got 37674 for test instead of 30K as the paper said. So, my first question is: did you filter the testing data? can you share the processed data?
Besides, I found that there are many answers in top-3000 frequent answers have very similar meaning. So, the model made these "wrong" predictions, which should have been viewed as corrected: gt: 2人, preds: 2人
gt: 1本, preds: 1本 gt: 緑, preds: 緑色 gt: 赤, preds: 赤色 gt: 白色, preds: 白 gt: 白, preds: 白色 gt: 一本, preds: 1本 gt: 1つ, preds: 1個 gt: 1, preds: 1つ gt: 2本, preds: 2つ gt: 1本, preds: 1つ
So, my second question is: did you pick or process the top-3000 frequent answers by some strategies? can you share the list of top-3000 frequent answers that you chose?
Thanks!