Closed marcozov closed 5 years ago
Hi, it should be definitely possible to extend the network to work on new images and indeed you'll have to extract features first (either spatial or object-based or both).
You'll need to know the specification of the images you have e.g. their height and width. The model stage is the stage of the pretrained pytorch resent you'd like to use to extract the features (4 by default). You can choose any batch size you'd like based on the size of the GPU you have.
The code goes over all the png file images in the image directory (--input_image_dir
): https://github.com/stanfordnlp/mac-network/blob/gqa/extract_features.py#L69 so all you'll need to do is to put the additional images in the directory that you use for that flag and run it and it should go smoothly.
Note that this code extracts only spatial features. For object-based features you'll need to run a separate object detector as in https://github.com/facebookresearch/Detectron on your images and once you have the extracted features you'll be able to run mac on them (similarly to how it works currently on GQA).
Please let me know if you have any other questions! :)
Thanks for your reply!
The images may have different size, as it happen in GQA: do those parameter refer to how the images are resized? Also, I would like to use exactly the same setup as the one that has been used in order to get the features files that are available on the website ( https://cs.stanford.edu/people/dorarad/gqa/download.html ): which object extractor did you use exactly? Was it pre-trained on COCO or ImageNet?
Thank you again.
np! Regarding the image height/width right you're correct, the height and width flags are actually the dimensions after resize https://github.com/stanfordnlp/mac-network/blob/master/extract_features.py#L90, not the original (I believe that's one of the common approaches to resize all images to a fixed size before extracting features from them)
For the object detector: I used https://github.com/peteanderson80/bottom-up-attention trained on all the images/scene graphs in the GQA training set.
Thanks.
Sorry if I insist, but: 1- what exact dimensions did you put for resize? 2- do you have the weights saved anywhere, so that I could avoid re-training the model from scratch? If that's not the case: there are several degrees of freedom in the procedure, do you have the code that was used for training? 3- did you convert GQA annotations (scene graphs) to VisualGenome format or did you use VisualGenome directly? As far as I understood GQA scene graphs are taken from VisualGenome: did you just split the latter dataset according to GQA train/validation split?
Thank you again.
Hi, happy to answer any questions!
Please let me know if you have further questions!
Thanks for the answer. I really hope you will make weights available, because I have already tried training other object detectors on scene graphs without achieving any positive result.
It might take some time, but in the meantime - I used exactly the same code as in https://github.com/peteanderson80/bottom-up-attention and the only change I did was the training set itself: I changed this list here: https://github.com/peteanderson80/bottom-up-attention/blob/master/data/genome/train.txt to include only the ids of images from the gqa training set (~70k out of the original 110k). Same parameters and everything. Trained for about 5 days on 4 Titan X gpus in parallel and then extracted the features using https://github.com/peteanderson80/bottom-up-attention/blob/master/tools/generate_tsv.py which generates a tsv file of all the features, and then I saved them as is just in a h5 format instead (just changing the file format from tsv to h5 to comply with my code, the features were kept fully identical). Hope it could help in the meantime! I will let you know when releasing weights!
Thank you very much! Last question: what performance do you obtain with the bottom-up-attention for object detection on GQA dataset? I guess you measured the performance on the validation split.. do you have some numbers (for instance, the mAP) ?
hi, I ran their script for evaluation one time in november I remember numbers for object detection were quite low but i don't have precise numbers currently :/ I don't think however mAP scores are a good indicator for the usefulness of features to a VQA task since there are many closely related objects (let's say a table and a desk) and even if the object detector doesn't manage to distinguish between them with high accuracy it won't necessarily affect the VQA end task
Hi @dorarad, Thanks for the great repo. Do you have any update on releasing pretrained weights for object detection? It would be great if you can share it.
Hi Thao, Marco, thanks a lot for the interest. unfortunately no update yet about releasing the weights - I'm having some trouble accessing some of my older files but I hope to resolve it.
Hi @dorarad https://github.com/dorarad,
Thanks for the great repo. Do you have any update on releasing pretrained weights for object detection? It would be great if you can share it.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stanfordnlp/mac-network/issues/36#issuecomment-719626039, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFNR43EBG4C7RUDMOYMETU3SNLMSRANCNFSM4H3XO6GQ .
Hello,
I would like to run the model on images that are not in the GQA dataset, but as if they were in GQA (basically I just want replace some images of the dataset with other images, and keep asking the same questions). For running the model on GQA I simply followed the instructions on the GQA branch, which consist in downloading the spatial features and the objects features and then to merge them.
But how do I extract those features from other images? I saw the extract_features.py script, but I don't fully understand how to use it in order to extract both spatial and object features. And what about the other parameters (image_height, image_width, model_stage, batch_size)? What should I use in order to extract features in the same way as the ones that you generated and put available to download?
Thanks in advance.