Closed brandonjabr closed 6 years ago
@brandonjabr I think you'll need to run the resnet and faster-rcnn by yourself to get the feature of an image, then input this feature and your question to their model to obtain the result.
@zengxianyu Thanks for your help, unfortunately I'm still a bit stuck, I tried making a dataset with just one image and corresponding .json files with a few questions for the image (using the COCO .json format). So far I've been able to feed the trained model questions from these custom files, which returns predictions as [1x3129] torch tensors.
How can I convert these tensors to the actual answer they represent as a sentence?
@brandonjabr see the generated Python dictionary in data/cache/trainval_label2ans.pkl
.
I've successfully trained the model and can load the state dict from the .pth model into a new instance. Is there any way I can now test it on a new image/question, and see the response?
Thank you!