Hi , I have a question about how to run VQA with my own dataset (folder with image files).
I have tried https://github.com/microsoft/scene_graph_benchmark to extract images,and the output is predictions.tsv
is it correct? I don't know what to do next?
I also have the same question. It seems that they use VinVL(predictions.tsv) for pertaining and faster/MaskRCNN for fine-tuning....
Have you figured it out? Thanks!
Hi , I have a question about how to run VQA with my own dataset (folder with image files). I have tried https://github.com/microsoft/scene_graph_benchmark to extract images,and the output is predictions.tsv is it correct? I don't know what to do next?